cs.CV（2024-12-03）

📊 共 10 篇论文

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (6) 支柱一：机器人控制 (Robot Control) (3) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Personalized Multimodal Large Language Models: A Survey	个性化多模态大语言模型综述：架构、训练与应用	large language model multimodal
2	WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image	WSI-LLaVA：用于全切片图像理解的多模态大语言模型	large language model multimodal
3	AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?	AV-Odyssey Bench：评估多模态LLM对视听信息理解能力的综合基准	large language model multimodal
4	MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues	MVCTrack：通过多模态引导的虚拟线索增强3D点云跟踪	multimodal
5	Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks	提出对抗环境攻击方法，劫持视觉语言导航智能体行为	VLN
6	VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning	提出VISCO基准，用于评估视觉推理中大模型进行细粒度评价与纠正的能力。	chain-of-thought

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
7	Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion	提出GOC框架，结合高斯溅射与表面补全，实现可编辑的物体级三维重建	OSC manipulation 3D gaussian splatting
8	GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos	GSGTrack：基于高斯溅射的RGB视频物体姿态跟踪	manipulation depth estimation 3D gaussian splatting
9	Fast LiDAR Data Generation with Rectified Flows	提出R2Flow，一种基于修正流的快速高保真LiDAR数据生成模型，加速机器人应用。	manipulation

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
10	It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model	提出基于反应式自回归扩散模型的实时双人对话动作生成方法	motion synthesis motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页