cs.CV(2025-07-16)

📊 共 12 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱九:具身大模型 (Embodied Foundation Models) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
1 SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation SGLoc:利用语义信息的3D高斯溅射相机位姿估计 3D gaussian splatting 3DGS gaussian splatting
2 Funnel-HOI: Top-Down Perception for Zero-Shot HOI Detection Funnel-HOI:一种用于零样本人-物交互检测的自顶向下感知框架 scene understanding human-object interaction HOI
3 PhysX-3D: Physical-Grounded 3D Asset Generation 提出PhysX-3D框架,用于生成具有物理属性的3D资产,解决现有方法忽略物理属性的问题。 affordance embodied AI
4 SpatialTrackerV2: 3D Point Tracking Made Easy SpatialTrackerV2:简易的单目视频3D点云追踪方法 monocular depth

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
5 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models 提出Mono-InternVL-1.5,一种更经济高效的单体多模态大语言模型,通过改进的预训练策略和优化推理加速,降低训练和推理成本。 visual pre-training large language model multimodal
6 Mitigating Object Hallucinations via Sentence-Level Early Intervention 提出SENTINEL框架,通过句子级早期干预缓解多模态大语言模型中的对象幻觉问题 preference learning DPO open-vocabulary
7 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition 提出DVFL-Net,一种轻量级蒸馏视频焦点调制网络,用于时空动作识别。 distillation spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
8 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning MindJourney:利用世界模型进行测试时缩放,提升视觉语言模型在空间推理任务上的性能。 manipulation reinforcement learning world model
9 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios 提出基于视觉的自动驾驶避障方案,融合YOLOv11和单目深度估计 motion planning depth estimation monocular depth

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
10 MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding 提出大规模多模态人类行为理解基准MMHU,助力安全驾驶系统发展 motion generation multimodal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)

#题目一句话要点标签🔗
11 UL-DD: A Multimodal Drowsiness Dataset Using Video, Biometric Signals, and Behavioral Data 提出UL-DD:一个融合视频、生物信号和行为数据的多模态驾驶员疲劳检测数据集 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
12 Spontaneous Spatial Cognition Emerges during Egocentric Video Viewing through Non-invasive BCI 通过非侵入式BCI解码自发空间认知,揭示人脑空间映射机制 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页