cs.CV（2025-07-05）

📊 共 11 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
1	CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning	提出CoT-Segmenter，利用思维链推理增强复杂道路场景中的OOD检测	large language model foundation model multimodal
2	Pedestrian Intention Prediction via Vision-Language Foundation Models	提出基于视觉-语言基础模型的行人意图预测方法，提升自动驾驶场景下的预测精度。	foundation model multimodal
3	PresentAgent: Multimodal Agent for Presentation Video Generation	提出PresentAgent，用于将长文档转化为带叙述的演示视频。	large language model multimodal	✅
4	Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles	Driver-Net：融合多相机信息评估自动驾驶中驾驶员接管准备度	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
5	Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation	提出MTU3D框架，桥接视觉理解与主动探索，实现高效通用具身导航	reinforcement learning representation learning scene understanding
6	LVLM-Composer's Explicit Planning for Image Generation	LVLM-Composer：通过显式规划提升图像生成的组合理解能力	reinforcement learning spatial relationship visual grounding
7	View Invariant Learning for Vision-Language Navigation in Continuous Environments	提出VIL视角不变学习方法，提升连续环境视觉-语言导航中视角变化的鲁棒性。	contrastive learning teacher-student embodied AI
8	EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation	EchoMimicV3：仅需13亿参数即可实现统一的多模态多任务人体动画	direct preference optimization classifier-free guidance

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
9	ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments	ArmGS：面向动态城市环境建模的复合高斯外观优化方法	3D gaussian splatting gaussian splatting splatting
10	VISC: mmWave Radar Scene Flow Estimation using Pervasive Visual-Inertial Supervision	提出基于视觉惯性数据监督的毫米波雷达场景流估计框架，解决数据稀缺问题。	scene flow

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Hierarchical Semantic-Visual Fusion of Visible and Near-infrared Images for Long-range Haze Removal	提出一种层级语义-视觉融合框架，用于解决远距离图像去雾问题。	penetration multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页