cs.CV(2026-02-10)
📊 共 32 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (12 🔗4)
支柱三:空间感知与语义 (Perception & Semantics) (9)
支柱二:RL算法与架构 (RL & Architecture) (7 🔗2)
支柱一:机器人控制 (Robot Control) (3)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | VideoAfford: Grounding 3D Affordance from Human-Object-Interaction Videos via Multimodal Large Language Model | VideoAfford:利用多模态大语言模型从人-物交互视频中学习3D可供性 | manipulation affordance human-object interaction | ||
| 30 | MVISTA-4D: View-Consistent 4D World Model with Test-Time Action Inference for Robotic Manipulation | MVISTA-4D:用于机器人操作的视角一致性4D世界模型与测试时动作推断 | manipulation world model | ||
| 31 | VideoWorld 2: Learning Transferable Knowledge from Real-world Videos | VideoWorld 2:提出动态增强潜在动力学模型,从真实视频中学习可迁移知识 | manipulation latent dynamics |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 32 | Time2General: Learning Spatiotemporal Invariant Representations for Domain-Generalization Video Semantic Segmentation | 提出Time2General框架,学习时空不变表征,解决域泛化视频语义分割问题。 | spatiotemporal |