cs.CV（2024-05-16）

📊 共 9 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱二：RL算法与架构 (RL & Architecture) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
1	When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	综述3D-LLM：多模态大语言模型在3D任务中的应用与挑战	NeRF neural radiance field scene understanding	✅
2	Toon3D: Seeing Cartoons from New Perspectives	提出Toon3D，从卡通图像中恢复几何不一致的3D结构	monocular depth geometric consistency
3	4D Panoptic Scene Graph Generation	提出PSG-4D：一种用于动态4D场景理解的全新表示方法与基准模型。	scene understanding large language model
4	Towards Task-Compatible Compressible Representations	提出可压缩的任务兼容表示以解决多任务学习中的性能问题	depth estimation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
5	Libra: Building Decoupled Vision System on Large Language Models	Libra：构建基于大语言模型的解耦视觉系统，提升图文理解能力	large language model foundation model multimodal	✅
6	PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology	PRISM：用于切片级别组织病理学的多模态生成式基础模型	foundation model
7	Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection	Grounding DINO 1.5：推进开放集目标检测的“边缘”能力	zero-shot transfer	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
8	AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale	发布AddBiomechanics数据集，用于大规模捕捉人体运动物理特性	human motion

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
9	A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts	提出基于模态专家混合的脑部病灶分割通用模型，实现多模态病灶的自动分割。	curriculum learning foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页