cs.CV（2025-07-12）

📊 共 16 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗2) 支柱一：机器人控制 (Robot Control) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1 🔗1) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ProactiveVideoQA: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language Models	提出ProactiveVideoQA基准，评估视频大语言模型的主动交互能力，并提出PAUC评价指标。	large language model multimodal TAMP	✅
2	Online Long-term Point Tracking in the Foundation Model Era	提出Track-On，解决在线长时点跟踪问题，并在多个基准测试中达到SOTA	embodied AI foundation model
3	Simplifying Traffic Anomaly Detection with Video Foundation Models	利用视频基础模型简化交通异常检测，实现高效且可扩展的异常事件识别。	foundation model	✅
4	Smart Routing for Multimodal Video Retrieval: When to Search What	ModaRoute：基于LLM的多模态视频检索智能路由系统，优化检索效率。	multimodal
5	Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift	提出StaRFM，融合FIP和CMP，提升Foundation Model在分布偏移下的鲁棒性和校准性	foundation model
6	MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models	提出MCA-LLaVA，缓解大视觉语言模型中的幻觉问题	multimodal	✅
7	PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment	PoseLLM：用MLP对齐增强语言引导的人体姿态估计	large language model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models	提出Prompt4Trust以解决多模态大语言模型的信心校准问题	reinforcement learning large language model multimodal	✅
9	Stable Score Distillation	提出Stable Score Distillation，提升文本引导图像和3D编辑的稳定性和对齐性	distillation NeRF classifier-free guidance
10	Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection	Geo-RepNet：针对内镜黏膜下剥离术中手术阶段识别的几何感知表征学习	representation learning spatial relationship
11	Cross Knowledge Distillation between Artificial and Spiking Neural Networks	提出跨模态知识蒸馏(CKD)方法，提升SNN在DVS数据上的性能	distillation	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Multimodal Visual Transformer for Sim2real Transfer in Visual Reinforcement Learning	提出基于多模态视觉Transformer的Sim2Real迁移学习方法	manipulation sim2real domain randomization

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding	Fast3D：加速3D多模态大语言模型，实现高效3D场景理解	scene understanding large language model	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
14	SnapMoGen: Human Motion Generation from Expressive Texts	SnapMoGen：提出高质量文本驱动人体运动生成数据集与改进的生成模型MoMask++	text-to-motion motion generation long-term motion generation	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	RoHOI: Robustness Benchmark for Human-Object Interaction Detection	提出RoHOI基准测试，用于评估和提升人-物交互检测在现实扰动下的鲁棒性。	human-object interaction HOI	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	EgoAnimate: Generating Human Animations from Egocentric top-down Views	EgoAnimate：从第一人称视角生成可动画的人体模型	egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页