cs.CV(2026-02-04)
📊 共 34 篇论文 | 🔗 11 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (8 🔗3)
支柱一:机器人控制 (Robot Control) (5 🔗2)
支柱四:生成式动作 (Generative Motion) (2 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (2 🔗2)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱八:物理动画 (Physics-based Animation) (2)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
🔬 支柱一:机器人控制 (Robot Control) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Natural Language Instructions for Scene-Responsive Human-in-the-Loop Motion Planning in Autonomous Driving using Vision-Language-Action Models | 利用视觉-语言-动作模型,实现场景响应式人机协同自动驾驶运动规划 | motion planning vision-language-action instruction following | ✅ | |
| 21 | AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation | AGILE:通过Agentic生成从视频中重建手-物交互 | manipulation dexterous manipulation contact-aware | ||
| 22 | CoWTracker: Tracking by Warping instead of Correlation | CoWTracker:提出一种基于形变的密集点跟踪方法,避免了代价体计算。 | manipulation optical flow spatiotemporal | ||
| 23 | SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking | SynthVerse:用于点跟踪的大规模多样化合成数据集 | manipulation foundation model | ||
| 24 | When and Where to Attack? Stage-wise Attention-Guided Adversarial Attack on Large Vision Language Models | 提出SAGA,一种阶段式注意力引导的视觉语言模型对抗攻击方法 | manipulation multimodal | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | DiMo: Discrete Diffusion Modeling for Motion Generation and Understanding | DiMo:用于运动生成与理解的离散扩散模型,统一文本-运动双向任务。 | text-to-motion motion generation motion prediction | ✅ | |
| 26 | Laminating Representation Autoencoders for Efficient Diffusion | 提出 FlatDINO,通过层叠表示自编码器高效压缩 DINOv2 特征用于扩散模型。 | classifier-free guidance |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image | VecSet-Edit:利用预训练LRM实现单图像网格编辑 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 28 | JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction | JOintGS:联合优化相机、人体和3D高斯,实现野外单目重建 | 3DGS splatting scene reconstruction | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | Depth-Guided Metric-Aware Temporal Consistency for Monocular Video Human Mesh Recovery | 提出深度引导的度量感知时序一致性框架,解决单目视频人体网格重建问题 | human mesh recovery | ||
| 30 | Temporal Slowness in Central Vision Drives Semantic Object Learning | 利用中心视觉的时间迟缓特性,提升自监督学习的物体语义表征能力 | egocentric Ego4D |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 31 | Adaptive 1D Video Diffusion Autoencoder | 提出One-DVA,一种自适应一维视频扩散自编码器,解决视频压缩和生成问题。 | spatiotemporal | ||
| 32 | HoloEv-Net: Efficient Event-based Action Recognition via Holographic Spatial Embedding and Global Spectral Gating | HoloEv-Net:通过全息空间嵌入和全局频谱门控实现高效的基于事件的动作识别 | spatiotemporal |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 33 | A labeled dataset of simulated phlebotomy procedures for medical AI: polygon annotations for object detection and human-object interaction | 构建模拟静脉采血数据集,用于医学AI中物体检测与人机交互研究 | human-object interaction |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 34 | TrajVG: 3D Trajectory-Coupled Visual Geometry Learning | TrajVG:提出轨迹耦合视觉几何学习框架,提升多帧3D重建在运动视频中的性能 | geometric consistency |