cs.CV(2024-05-16)
📊 共 9 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱二:RL算法与架构 (RL & Architecture) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | 综述3D-LLM:多模态大语言模型在3D任务中的应用与挑战 | NeRF neural radiance field scene understanding | ✅ | |
| 2 | Toon3D: Seeing Cartoons from New Perspectives | 提出Toon3D,从卡通图像中恢复几何不一致的3D结构 | monocular depth geometric consistency | ||
| 3 | 4D Panoptic Scene Graph Generation | 提出PSG-4D:一种用于动态4D场景理解的全新表示方法与基准模型。 | scene understanding large language model | ||
| 4 | Towards Task-Compatible Compressible Representations | 提出可压缩的任务兼容表示以解决多任务学习中的性能问题 | depth estimation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Libra: Building Decoupled Vision System on Large Language Models | Libra:构建基于大语言模型的解耦视觉系统,提升图文理解能力 | large language model foundation model multimodal | ✅ | |
| 6 | PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology | PRISM:用于切片级别组织病理学的多模态生成式基础模型 | foundation model | ||
| 7 | Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection | Grounding DINO 1.5:推进开放集目标检测的“边缘”能力 | zero-shot transfer | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale | 发布AddBiomechanics数据集,用于大规模捕捉人体运动物理特性 | human motion |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts | 提出基于模态专家混合的脑部病灶分割通用模型,实现多模态病灶的自动分割。 | curriculum learning foundation model |