cs.CV（2025-06-06）

📊 共 29 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (10) 支柱九：具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (4 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (3 🔗2) 支柱一：机器人控制 (Robot Control) (3 🔗1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments	提出Dy3DGS-SLAM以解决动态环境下单目SLAM问题	3D gaussian splatting 3DGS gaussian splatting
2	Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models	提出Pts3D-LLM以提升3D场景理解的效果	scene understanding large language model multimodal
3	GS4: Generalizable Sparse Splatting Semantic SLAM	提出GS4以解决传统SLAM在语义映射中的不足问题	gaussian splatting splatting semantic mapping
4	Textile Analysis for Recycling Automation using Transfer Learning and Zero-Shot Foundation Models	提出基于迁移学习和零样本模型的纺织品回收自动化分析方法	open-vocabulary open vocabulary foundation model
5	STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving	提出STSBench以解决多模态大语言模型在自动驾驶中的时空推理问题	scene understanding large language model
6	Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection	提出迭代视觉基础框架以解决视觉关系检测的泛化问题	scene understanding embodied AI large language model
7	Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery	提出CryoSPIRE以解决冷冻电子显微镜中生物分子重建问题	gaussian splatting splatting scene reconstruction
8	HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios	提出HMVLM以解决长尾驾驶场景中的决策问题	scene understanding chain-of-thought
9	Aerial Multi-View Stereo via Adaptive Depth Range Inference and Normal Cues	提出自适应深度范围MVS以解决航空多视图立体重建问题	depth estimation feature matching
10	Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration	提出Token Transforming框架以加速视觉Transformer并减少信息损失	depth estimation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models	提出视觉图形竞技场以解决视觉概念化问题	large language model multimodal
12	DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models	提出DriveAction基准以解决VLA模型决策多样性不足问题	vision-language-action VLA
13	CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval	提出CLaMR以解决多模态视频内容检索问题	multimodal
14	MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks	提出MCA-Bench以评估CAPTCHA对VLM攻击的鲁棒性	multimodal
15	CoMemo: LVLMs Need Image Context with Image Memory	提出CoMemo以解决LVLM在图像上下文处理中的信息忽视问题	large language model multimodal	✅
16	VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning	提出VideoChat-A1以解决长视频理解问题	large language model multimodal
17	Cross-View Multi-Modal Segmentation @ Ego-Exo4D Challenges 2025	提出跨视角多模态物体分割方法以解决Ego-Exo4D挑战	multimodal	✅
18	MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory	提出MoralCLIP以解决视觉语言模型道德理解不足问题	multimodal

🔬 支柱六：视频提取与匹配 (Video Extraction) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge	提出在线错误检测框架以解决工业自动化中的实时纠错问题	egocentric large language model
20	EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs	提出EASG-Bench以解决长视频理解中的问答挑战	egocentric large language model	✅
21	Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	提出跨视角协作智能以解决视频理解中的视角融合问题	egocentric	✅
22	O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views	提出O-MaMa以解决不同视角下物体分割问题	egocentric

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models	提出通过动态模型引导世界模型以解决多模态基础模型的局限性	world model foundation model multimodal
24	TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation	提出TerraFM以解决多传感器地球观测数据的统一学习问题	contrastive learning foundation model	✅
25	GazeNLQ @ Ego4D Natural Language Queries Challenge 2025	提出GazeNLQ以解决Ego4D自然语言查询问题	contrastive learning egocentric Ego4D	✅

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
26	You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping	提出YOEO方法以解决关节物体类别级6D姿态估计问题	manipulation 6D pose estimation
27	A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance	提出深度学习方法以解决监控视频中的面部属性操控与重建问题	manipulation
28	Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation	提出上下文语义一致性学习以解决多模态媒体操控检测问题	manipulation	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation	提出MOGO以解决高质量实时3D人类运动生成问题	text-to-motion motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页