cs.CV(2025-06-06)

📊 共 29 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (10) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (4 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
1 Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments 提出Dy3DGS-SLAM以解决动态环境下单目SLAM问题 3D gaussian splatting 3DGS gaussian splatting
2 Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models 提出Pts3D-LLM以提升3D场景理解的效果 scene understanding large language model multimodal
3 GS4: Generalizable Sparse Splatting Semantic SLAM 提出GS4以解决传统SLAM在语义映射中的不足问题 gaussian splatting splatting semantic mapping
4 Textile Analysis for Recycling Automation using Transfer Learning and Zero-Shot Foundation Models 提出基于迁移学习和零样本模型的纺织品回收自动化分析方法 open-vocabulary open vocabulary foundation model
5 STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving 提出STSBench以解决多模态大语言模型在自动驾驶中的时空推理问题 scene understanding large language model
6 Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection 提出迭代视觉基础框架以解决视觉关系检测的泛化问题 scene understanding embodied AI large language model
7 Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery 提出CryoSPIRE以解决冷冻电子显微镜中生物分子重建问题 gaussian splatting splatting scene reconstruction
8 HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios 提出HMVLM以解决长尾驾驶场景中的决策问题 scene understanding chain-of-thought
9 Aerial Multi-View Stereo via Adaptive Depth Range Inference and Normal Cues 提出自适应深度范围MVS以解决航空多视图立体重建问题 depth estimation feature matching
10 Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration 提出Token Transforming框架以加速视觉Transformer并减少信息损失 depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
11 Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models 提出视觉图形竞技场以解决视觉概念化问题 large language model multimodal
12 DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models 提出DriveAction基准以解决VLA模型决策多样性不足问题 vision-language-action VLA
13 CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval 提出CLaMR以解决多模态视频内容检索问题 multimodal
14 MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks 提出MCA-Bench以评估CAPTCHA对VLM攻击的鲁棒性 multimodal
15 CoMemo: LVLMs Need Image Context with Image Memory 提出CoMemo以解决LVLM在图像上下文处理中的信息忽视问题 large language model multimodal
16 VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning 提出VideoChat-A1以解决长视频理解问题 large language model multimodal
17 Cross-View Multi-Modal Segmentation @ Ego-Exo4D Challenges 2025 提出跨视角多模态物体分割方法以解决Ego-Exo4D挑战 multimodal
18 MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory 提出MoralCLIP以解决视觉语言模型道德理解不足问题 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (4 篇)

#题目一句话要点标签🔗
19 Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge 提出在线错误检测框架以解决工业自动化中的实时纠错问题 egocentric large language model
20 EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs 提出EASG-Bench以解决长视频理解中的问答挑战 egocentric large language model
21 Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision 提出跨视角协作智能以解决视频理解中的视角融合问题 egocentric
22 O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views 提出O-MaMa以解决不同视角下物体分割问题 egocentric

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
23 Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models 提出通过动态模型引导世界模型以解决多模态基础模型的局限性 world model foundation model multimodal
24 TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation 提出TerraFM以解决多传感器地球观测数据的统一学习问题 contrastive learning foundation model
25 GazeNLQ @ Ego4D Natural Language Queries Challenge 2025 提出GazeNLQ以解决Ego4D自然语言查询问题 contrastive learning egocentric Ego4D

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
26 You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping 提出YOEO方法以解决关节物体类别级6D姿态估计问题 manipulation 6D pose estimation
27 A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance 提出深度学习方法以解决监控视频中的面部属性操控与重建问题 manipulation
28 Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation 提出上下文语义一致性学习以解决多模态媒体操控检测问题 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
29 MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation 提出MOGO以解决高质量实时3D人类运动生成问题 text-to-motion motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页