cs.CV(2025-07-05)

📊 共 11 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
1 CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning 提出CoT-Segmenter,利用思维链推理增强复杂道路场景中的OOD检测 large language model foundation model multimodal
2 Pedestrian Intention Prediction via Vision-Language Foundation Models 提出基于视觉-语言基础模型的行人意图预测方法,提升自动驾驶场景下的预测精度。 foundation model multimodal
3 PresentAgent: Multimodal Agent for Presentation Video Generation 提出PresentAgent,用于将长文档转化为带叙述的演示视频。 large language model multimodal
4 Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles Driver-Net:融合多相机信息评估自动驾驶中驾驶员接管准备度 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
5 Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation 提出MTU3D框架,桥接视觉理解与主动探索,实现高效通用具身导航 reinforcement learning representation learning scene understanding
6 LVLM-Composer's Explicit Planning for Image Generation LVLM-Composer:通过显式规划提升图像生成的组合理解能力 reinforcement learning spatial relationship visual grounding
7 View Invariant Learning for Vision-Language Navigation in Continuous Environments 提出VIL视角不变学习方法,提升连续环境视觉-语言导航中视角变化的鲁棒性。 contrastive learning teacher-student embodied AI
8 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation EchoMimicV3:仅需13亿参数即可实现统一的多模态多任务人体动画 direct preference optimization classifier-free guidance

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
9 ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments ArmGS:面向动态城市环境建模的复合高斯外观优化方法 3D gaussian splatting gaussian splatting splatting
10 VISC: mmWave Radar Scene Flow Estimation using Pervasive Visual-Inertial Supervision 提出基于视觉惯性数据监督的毫米波雷达场景流估计框架,解决数据稀缺问题。 scene flow

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
11 Hierarchical Semantic-Visual Fusion of Visible and Near-infrared Images for Long-range Haze Removal 提出一种层级语义-视觉融合框架,用于解决远距离图像去雾问题。 penetration multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页