cs.CV(2025-10-02)
📊 共 33 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (13 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (10 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱一:机器人控制 (Robot Control) (1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | VLA-R1: Enhancing Reasoning in Vision-Language-Action Models | 提出VLA-R1以解决视觉-语言-行动模型推理不足问题 | reinforcement learning reward design affordance | ✅ | |
| 15 | GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation | GeoPurify通过几何蒸馏,以数据高效的方式实现开放词汇3D分割。 | distillation open-vocabulary open vocabulary | ✅ | |
| 16 | RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning | 提出RewardMap,通过多阶段强化学习解决细粒度视觉推理中的稀疏奖励问题 | reinforcement learning reward design large language model | ||
| 17 | MultiModal Action Conditioned Video Generation | 提出多模态动作条件视频生成模型,提升机器人精细操作的模拟精度 | world model multimodal | ||
| 18 | DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing | DragFlow:利用区域监督释放DiT先验,实现卓越的拖拽编辑效果 | flow matching large language model multimodal | ||
| 19 | Flow-Matching Guided Deep Unfolding for Hyperspectral Image Reconstruction | 提出Flow-Matching引导的深度展开网络FMU,用于高光谱图像重建。 | flow matching HSI | ✅ | |
| 20 | Towards Better Optimization For Listwise Preference in Diffusion Models | 提出Diffusion-LPO,用于扩散模型中基于列表偏好的优化,提升图像质量和偏好对齐。 | reinforcement learning RLHF DPO | ||
| 21 | Discrete Facial Encoding: : A Framework for Data-driven Facial Display Discovery | 提出离散面部编码(DFE),用于数据驱动的面部表情发现,替代FACS。 | representation learning masked autoencoder VQ-VAE | ||
| 22 | Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback | 提出Oracle-RLAIF框架,通过排序反馈强化学习提升多模态视频模型性能。 | reinforcement learning | ||
| 23 | Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning | 提出基于Rollout引导的自适应像素空间推理框架,提升VLM在细粒度视觉任务上的效率和准确性。 | reinforcement learning multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | LOBE-GS: Load-Balanced and Efficient 3D Gaussian Splatting for Large-Scale Scene Reconstruction | LoBE-GS:面向大规模场景重建的负载均衡高效3D高斯溅射 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 25 | StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions | StealthAttack:提出一种基于密度引导的3D高斯溅射隐蔽投毒攻击方法 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 26 | 4DGS-Craft: Consistent and Interactive 4D Gaussian Splatting Editing | 提出4DGS-Craft以解决4D高斯点云编辑一致性问题 | gaussian splatting splatting VGGT | ||
| 27 | Visual Odometry with Transformers | 提出基于Transformer的视觉里程计VoT,实现端到端单目位姿回归。 | visual odometry feature matching foundation model | ||
| 28 | GaussianMorphing: Mesh-Guided 3D Gaussians for Semantic-Aware Object Morphing | GaussianMorphing:提出网格引导的3D高斯方法,实现语义感知的物体形变。 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 29 | Non-Rigid Structure-from-Motion via Differential Geometry with Recoverable Conformal Scale | 提出Con-NRSfM,通过可恢复共形尺度微分几何解决非刚性结构重建问题。 | depth estimation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | Clink! Chop! Thud! -- Learning Object Sounds from Real-World Interactions | 提出基于真实世界交互学习物体声音的检测框架,解决声音与物体的关联问题。 | egocentric multimodal | ||
| 31 | Ego-Exo 3D Hand Tracking in the Wild with a Mobile Multi-Camera Rig | 提出一种移动多相机系统,用于在真实场景中进行ego-exo 3D手部追踪。 | egocentric |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 32 | PhysHMR: Learning Humanoid Control Policies from Vision for Physically Plausible Human Motion Reconstruction | PhysHMR:从视觉学习人形控制策略,实现物理上合理的人体运动重建 | humanoid humanoid control reinforcement learning |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 33 | Learning to Generate Rigid Body Interactions with Video Diffusion Models | KineMask:利用视频扩散模型生成具有刚体交互的视频 | physically plausible | ✅ |