cs.CV(2025-04-28)
📊 共 26 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (10 🔗4)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗3)
支柱五:交互与反应 (Interaction & Reaction) (2)
支柱四:生成式动作 (Generative Motion) (2)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱一:机器人控制 (Robot Control) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback | CoherenDream:利用多模态大语言模型反馈提升3D生成中的文本一致性 | distillation large language model multimodal | ||
| 12 | Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding | 提出MPEC,用于开放词汇3D场景理解,提升语义分割和零样本能力。 | contrastive learning scene understanding open-vocabulary | ||
| 13 | DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer | DiVE:基于视频扩散Transformer的高效多视角驾驶场景生成 | distillation classifier-free guidance spatiotemporal | ||
| 14 | Mesh-Learner: Texturing Mesh with Spherical Harmonics | Mesh-Learner:利用球谐函数纹理实现可微分的网格渲染与重建 | reinforcement learning 3D gaussian splatting gaussian splatting | ✅ | |
| 15 | Taming the Randomness: Towards Label-Preserving Cropping in Contrastive Learning | 提出标签保持裁剪方法,提升对比学习在图像分类中的鲁棒性 | contrastive learning |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video | 提出CoPE-NeRF,通过联合优化神经辐射场和连续相机运动,实现单目视频的三维重建。 | depth estimation NeRF neural radiance field | ✅ | |
| 17 | STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction | 提出STCOcc,利用稀疏时空级联更新进行3D occupancy和场景流预测 | scene flow | ✅ | |
| 18 | CE-NPBG: Connectivity Enhanced Neural Point-Based Graphics for Novel View Synthesis in Autonomous Driving Scenes | CE-NPBG:面向自动驾驶场景,提出连接增强的神经点云图新视角合成方法 | 3D gaussian splatting gaussian splatting splatting | ||
| 19 | Category-Level and Open-Set Object Pose Estimation for Robotics | 针对机器人,研究类别级和开放集物体姿态估计方法 | scene understanding 6D pose estimation | ||
| 20 | MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion | MP-SfM:利用单目表面先验实现鲁棒的Structure-from-Motion | monocular depth | ✅ |
🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration | 提出Seg2HOI框架,集成分割模型增强人-物交互预测,实现零样本泛化。 | human-object interaction HOI foundation model | ||
| 22 | HOIGaze: Gaze Estimation During Hand-Object Interactions in Extended Reality Exploiting Eye-Hand-Head Coordination | HOIGaze:利用眼-手-头协同,提升扩展现实中手-物交互的注视点估计精度 | HOI |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | Physics-Informed Diffusion Models for SAR Ship Wake Generation from Text Prompts | 提出基于物理信息的扩散模型,用于从文本提示生成SAR船舶尾迹 | physics-informed diffusion | ||
| 24 | CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design | CasaGPT:提出基于长方体排列的室内场景合成方法,提升场景真实感。 | physically plausible |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Learning Streaming Video Representation via Multitask Training | 提出StreamFormer,通过多任务训练学习高效的流式视频表示,适用于实时应用。 | spatial relationship embodied AI |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | ShowMak3r: Compositional TV Show Reconstruction | ShowMak3r:提出一种可组合的电视剧场景重建方法,用于编辑和操控演员及场景。 | manipulation TAMP | ✅ |