cs.CV（2024-05-24）

📊 共 6 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (2) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
1	M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models	M4U：一个用于评估大型多模态模型多语言理解和推理能力的新基准	large language model multimodal
2	DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception	DEEM：利用扩散模型为大语言模型提供视觉感知能力，提升其鲁棒性。	large language model multimodal

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
3	iVideoGPT: Interactive VideoGPTs are Scalable World Models	iVideoGPT：可扩展的交互式VideoGPTs作为世界模型，用于交互式探索和决策。	manipulation reinforcement learning world model	✅
4	Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies	SOFT：将通用预训练视觉Transformer重塑为面向对象的场景编码器，用于操作策略	manipulation

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
5	Score Distillation via Reparametrized DDIM	通过重参数化DDIM改进Score Distillation，提升3D形状生成质量	distillation

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
6	NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes	NeB-SLAM：基于神经块的可扩展RGB-D SLAM，用于未知场景	implicit representation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页