cs.CV(2025-07-05)
📊 共 11 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning | 提出CoT-Segmenter,利用思维链推理增强复杂道路场景中的OOD检测 | large language model foundation model multimodal | ||
| 2 | Pedestrian Intention Prediction via Vision-Language Foundation Models | 提出基于视觉-语言基础模型的行人意图预测方法,提升自动驾驶场景下的预测精度。 | foundation model multimodal | ||
| 3 | PresentAgent: Multimodal Agent for Presentation Video Generation | 提出PresentAgent,用于将长文档转化为带叙述的演示视频。 | large language model multimodal | ✅ | |
| 4 | Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles | Driver-Net:融合多相机信息评估自动驾驶中驾驶员接管准备度 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation | 提出MTU3D框架,桥接视觉理解与主动探索,实现高效通用具身导航 | reinforcement learning representation learning scene understanding | ||
| 6 | LVLM-Composer's Explicit Planning for Image Generation | LVLM-Composer:通过显式规划提升图像生成的组合理解能力 | reinforcement learning spatial relationship visual grounding | ||
| 7 | View Invariant Learning for Vision-Language Navigation in Continuous Environments | 提出VIL视角不变学习方法,提升连续环境视觉-语言导航中视角变化的鲁棒性。 | contrastive learning teacher-student embodied AI | ||
| 8 | EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation | EchoMimicV3:仅需13亿参数即可实现统一的多模态多任务人体动画 | direct preference optimization classifier-free guidance |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments | ArmGS:面向动态城市环境建模的复合高斯外观优化方法 | 3D gaussian splatting gaussian splatting splatting | ||
| 10 | VISC: mmWave Radar Scene Flow Estimation using Pervasive Visual-Inertial Supervision | 提出基于视觉惯性数据监督的毫米波雷达场景流估计框架,解决数据稀缺问题。 | scene flow |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Hierarchical Semantic-Visual Fusion of Visible and Near-infrared Images for Long-range Haze Removal | 提出一种层级语义-视觉融合框架,用于解决远距离图像去雾问题。 | penetration multimodal |