cs.CV(2025-07-12)

📊 共 16 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱一:机器人控制 (Robot Control) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 ProactiveVideoQA: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language Models 提出ProactiveVideoQA基准,评估视频大语言模型的主动交互能力,并提出PAUC评价指标。 large language model multimodal TAMP
2 Online Long-term Point Tracking in the Foundation Model Era 提出Track-On,解决在线长时点跟踪问题,并在多个基准测试中达到SOTA embodied AI foundation model
3 Simplifying Traffic Anomaly Detection with Video Foundation Models 利用视频基础模型简化交通异常检测,实现高效且可扩展的异常事件识别。 foundation model
4 Smart Routing for Multimodal Video Retrieval: When to Search What ModaRoute:基于LLM的多模态视频检索智能路由系统,优化检索效率。 multimodal
5 Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift 提出StaRFM,融合FIP和CMP,提升Foundation Model在分布偏移下的鲁棒性和校准性 foundation model
6 MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models 提出MCA-LLaVA,缓解大视觉语言模型中的幻觉问题 multimodal
7 PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment PoseLLM:用MLP对齐增强语言引导的人体姿态估计 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
8 Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models 提出Prompt4Trust以解决多模态大语言模型的信心校准问题 reinforcement learning large language model multimodal
9 Stable Score Distillation 提出Stable Score Distillation,提升文本引导图像和3D编辑的稳定性和对齐性 distillation NeRF classifier-free guidance
10 Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection Geo-RepNet:针对内镜黏膜下剥离术中手术阶段识别的几何感知表征学习 representation learning spatial relationship
11 Cross Knowledge Distillation between Artificial and Spiking Neural Networks 提出跨模态知识蒸馏(CKD)方法,提升SNN在DVS数据上的性能 distillation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
12 Multimodal Visual Transformer for Sim2real Transfer in Visual Reinforcement Learning 提出基于多模态视觉Transformer的Sim2Real迁移学习方法 manipulation sim2real domain randomization

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
13 Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding Fast3D:加速3D多模态大语言模型,实现高效3D场景理解 scene understanding large language model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
14 SnapMoGen: Human Motion Generation from Expressive Texts SnapMoGen:提出高质量文本驱动人体运动生成数据集与改进的生成模型MoMask++ text-to-motion motion generation long-term motion generation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
15 RoHOI: Robustness Benchmark for Human-Object Interaction Detection 提出RoHOI基准测试,用于评估和提升人-物交互检测在现实扰动下的鲁棒性。 human-object interaction HOI

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
16 EgoAnimate: Generating Human Animations from Egocentric top-down Views EgoAnimate:从第一人称视角生成可动画的人体模型 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页