cs.CV(2024-12-03)
📊 共 10 篇论文
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6)
支柱一:机器人控制 (Robot Control) (3)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Personalized Multimodal Large Language Models: A Survey | 个性化多模态大语言模型综述:架构、训练与应用 | large language model multimodal | ||
| 2 | WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image | WSI-LLaVA:用于全切片图像理解的多模态大语言模型 | large language model multimodal | ||
| 3 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | AV-Odyssey Bench:评估多模态LLM对视听信息理解能力的综合基准 | large language model multimodal | ||
| 4 | MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues | MVCTrack:通过多模态引导的虚拟线索增强3D点云跟踪 | multimodal | ||
| 5 | Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks | 提出对抗环境攻击方法,劫持视觉语言导航智能体行为 | VLN | ||
| 6 | VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning | 提出VISCO基准,用于评估视觉推理中大模型进行细粒度评价与纠正的能力。 | chain-of-thought |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion | 提出GOC框架,结合高斯溅射与表面补全,实现可编辑的物体级三维重建 | OSC manipulation 3D gaussian splatting | ||
| 8 | GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos | GSGTrack:基于高斯溅射的RGB视频物体姿态跟踪 | manipulation depth estimation 3D gaussian splatting | ||
| 9 | Fast LiDAR Data Generation with Rectified Flows | 提出R2Flow,一种基于修正流的快速高保真LiDAR数据生成模型,加速机器人应用。 | manipulation |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model | 提出基于反应式自回归扩散模型的实时双人对话动作生成方法 | motion synthesis motion generation |