cs.CV(2025-10-20)
📊 共 37 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (9)
支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1)
支柱一:机器人控制 (Robot Control) (3 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱八:物理动画 (Physics-based Animation) (2 🔗2)
支柱四:生成式动作 (Generative Motion) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models | 提出Plug-and-Forecast,利用多模态大语言模型增强运动预测模型,提升泛化能力。 | scene understanding motion prediction large language model | ||
| 22 | From Volume Rendering to 3D Gaussian Splatting: Theory and Applications | 综述3D高斯溅射:从体渲染到应用,解决实时渲染与高质量重建难题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 23 | RaindropGS: A Benchmark for 3D Gaussian Splatting under Raindrop Conditions | RaindropGS:雨滴条件下3D高斯溅射重建的综合评测基准 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 24 | Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS | 提出更强的初始化流程ItG-GS,显著提升稀疏视角3DGS的重建质量 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 25 | PAGE-4D: Disentangled Pose and Geometry Estimation for VGGT-4D Perception | PAGE-4D:解耦姿态与几何信息的动态场景VGGT-4D感知 | depth estimation VGGT | ||
| 26 | Towards 3D Objectness Learning in an Open World | 提出OP3Det,解决开放世界3D场景中通用物体检测问题。 | open-vocabulary open vocabulary foundation model | ||
| 27 | HouseTour: A Virtual Real Estate A(I)gent | HouseTour:提出一种利用扩散模型生成空间感知三维相机轨迹和自然语言摘要的方法,用于房地产场景。 | 3D gaussian splatting gaussian splatting splatting | ||
| 28 | DeepDetect: Learning All-in-One Dense Keypoints | DeepDetect:提出一种融合经典检测器优势的端到端密集关键点检测方法 | visual odometry |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | GSPlane: Concise and Accurate Planar Reconstruction via Structured Representation | GSPlane:通过结构化表示实现简洁而精确的平面重建 | manipulation gaussian splatting splatting | ||
| 30 | SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving | SafeCoop:针对基于自然语言协同驾驶的全栈安全防御框架 | manipulation | ✅ | |
| 31 | ConsistEdit: Highly Consistent and Precise Training-free Visual Editing | ConsistEdit:提出一种高一致性和精确度的免训练视觉编辑方法 | manipulation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 32 | ManzaiSet: A Multimodal Dataset of Viewer Responses to Japanese Manzai Comedy | 提出ManzaiSet:一个用于研究观众对日本漫才反应的大规模多模态数据集 | HuMoR multimodal | ||
| 33 | Leveraging AV1 motion vectors for Fast and Dense Feature Matching | 利用AV1运动矢量实现快速稠密特征匹配,提升SfM效率 | feature matching |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 34 | ViBED-Net: Video Based Engagement Detection Network Using Face-Aware and Scene-Aware Spatiotemporal Cues | ViBED-Net:利用面部感知和场景感知的时空线索进行视频参与度检测 | spatiotemporal | ✅ | |
| 35 | MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models | MUG-V 10B:面向大规模视频生成模型的高效训练框架 | spatiotemporal | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 36 | Capturing Head Avatar with Hand Contacts from a Monocular Video | 提出一种单目视频头部Avatar重建方法,解决手部交互形变建模问题 | penetration spatial relationship |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 37 | ShapeCraft: LLM Agents for Structured, Textured and Interactive 3D Modeling | ShapeCraft:利用LLM智能体进行结构化、纹理化和交互式3D建模 | spatial relationship |