cs.CV(2024-12-03)

📊 共 10 篇论文

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (3) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 Personalized Multimodal Large Language Models: A Survey 个性化多模态大语言模型综述:架构、训练与应用 large language model multimodal
2 WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image WSI-LLaVA:用于全切片图像理解的多模态大语言模型 large language model multimodal
3 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? AV-Odyssey Bench:评估多模态LLM对视听信息理解能力的综合基准 large language model multimodal
4 MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues MVCTrack:通过多模态引导的虚拟线索增强3D点云跟踪 multimodal
5 Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks 提出对抗环境攻击方法,劫持视觉语言导航智能体行为 VLN
6 VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning 提出VISCO基准,用于评估视觉推理中大模型进行细粒度评价与纠正的能力。 chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
7 Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion 提出GOC框架,结合高斯溅射与表面补全,实现可编辑的物体级三维重建 OSC manipulation 3D gaussian splatting
8 GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos GSGTrack:基于高斯溅射的RGB视频物体姿态跟踪 manipulation depth estimation 3D gaussian splatting
9 Fast LiDAR Data Generation with Rectified Flows 提出R2Flow,一种基于修正流的快速高保真LiDAR数据生成模型,加速机器人应用。 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
10 It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model 提出基于反应式自回归扩散模型的实时双人对话动作生成方法 motion synthesis motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页