cs.CV(2025-07-02)

📊 共 12 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (3)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks 评估GPT-4o等多模态模型在标准计算机视觉任务上的性能 foundation model multimodal
2 Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges 综述:利用大型语言模型进行视频碰撞检测的方法、数据集与挑战 large language model foundation model multimodal
3 ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning 提出ESTR-CoT框架,利用思维链推理提升事件流场景文本识别的准确性和可解释性 large language model chain-of-thought
4 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning 提出ReasonBrain框架与Reason50K数据集,解决基于推理的图像编辑难题 large language model multimodal
5 What does really matter in image goal navigation? 提出端到端强化学习方法以解决图像目标导航问题 embodied AI

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
6 Underwater Monocular Metric Depth Estimation: Real-World Benchmarks and Synthetic Fine-Tuning with Vision Foundation Models 提出水下单目深度估计基准,并用视觉基础模型进行合成数据微调。 depth estimation monocular depth metric depth
7 Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach 提出一种零样本多模态复合表情识别方法,无需目标数据训练即可实现高性能。 scene understanding multimodal
8 ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving W-CODA研讨会旨在通过多模态感知与理解技术,探索自动驾驶Corner Case的下一代解决方案。 scene understanding multimodal
9 Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation 提出Snake-NeRF框架,实现NeRF从局部到全局3D地球观测的扩展 NeRF neural radiance field

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
10 Kwai Keye-VL Technical Report 提出Kwai Keye-VL,用于提升多模态大模型在短视频理解上的性能。 reinforcement learning large language model foundation model
11 A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs TANGERINE:用于肺癌筛查中胸部疾病检测的计算友好型开源基础模型 masked autoencoder foundation model
12 Robust brain age estimation from structural MRI with contrastive learning 提出对比学习以增强结构性MRI脑龄估计的鲁棒性 MAE contrastive learning foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页