cs.CV(2025-12-16)
📊 共 6 篇论文
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱一:机器人控制 (Robot Control) (2)
支柱九:具身大模型 (Embodied Foundation Models) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling | WorldPlay:提出一种具有长期几何一致性的实时交互式世界建模方法 | world model distillation geometric consistency | ||
| 2 | A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning | 提出A4-Agent,一个零样本具身智能框架,用于解决物体交互区域的推理问题。 | dreamer affordance embodied AI | ||
| 3 | TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs | TimeLens:利用多模态LLM重新思考视频时序定位任务,并构建高质量基线。 | reinforcement learning large language model multimodal |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives | CRISP:基于单目视频和平面场景原语的接触引导Real2Sim方法 | humanoid humanoid control real2sim | ||
| 5 | DRAW2ACT: Turning Depth-Encoded Trajectories into Robotic Demonstration Videos | DRAW2ACT:提出深度感知的轨迹条件视频生成框架,用于机器人操作演示视频生成。 | manipulation embodied AI multimodal |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices | HyperVL:面向边缘设备的高效动态多模态大语言模型 | large language model multimodal |