cs.CV(2024-05-16)

📊 共 9 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱二:RL算法与架构 (RL & Architecture) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
1 When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models 综述3D-LLM:多模态大语言模型在3D任务中的应用与挑战 NeRF neural radiance field scene understanding
2 Toon3D: Seeing Cartoons from New Perspectives 提出Toon3D,从卡通图像中恢复几何不一致的3D结构 monocular depth geometric consistency
3 4D Panoptic Scene Graph Generation 提出PSG-4D:一种用于动态4D场景理解的全新表示方法与基准模型。 scene understanding large language model
4 Towards Task-Compatible Compressible Representations 提出可压缩的任务兼容表示以解决多任务学习中的性能问题 depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
5 Libra: Building Decoupled Vision System on Large Language Models Libra:构建基于大语言模型的解耦视觉系统,提升图文理解能力 large language model foundation model multimodal
6 PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology PRISM:用于切片级别组织病理学的多模态生成式基础模型 foundation model
7 Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Grounding DINO 1.5:推进开放集目标检测的“边缘”能力 zero-shot transfer

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
8 AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale 发布AddBiomechanics数据集,用于大规模捕捉人体运动物理特性 human motion

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
9 A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts 提出基于模态专家混合的脑部病灶分割通用模型,实现多模态病灶的自动分割。 curriculum learning foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页