cs.CV（2025-07-02）

📊 共 12 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (4) 支柱二：RL算法与架构 (RL & Architecture) (3)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks	评估GPT-4o等多模态模型在标准计算机视觉任务上的性能	foundation model multimodal
2	Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges	综述：利用大型语言模型进行视频碰撞检测的方法、数据集与挑战	large language model foundation model multimodal
3	ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning	提出ESTR-CoT框架，利用思维链推理提升事件流场景文本识别的准确性和可解释性	large language model chain-of-thought	✅
4	Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning	提出ReasonBrain框架与Reason50K数据集，解决基于推理的图像编辑难题	large language model multimodal
5	What does really matter in image goal navigation?	提出端到端强化学习方法以解决图像目标导航问题	embodied AI

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Underwater Monocular Metric Depth Estimation: Real-World Benchmarks and Synthetic Fine-Tuning with Vision Foundation Models	提出水下单目深度估计基准，并用视觉基础模型进行合成数据微调。	depth estimation monocular depth metric depth
7	Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach	提出一种零样本多模态复合表情识别方法，无需目标数据训练即可实现高性能。	scene understanding multimodal
8	ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving	W-CODA研讨会旨在通过多模态感知与理解技术，探索自动驾驶Corner Case的下一代解决方案。	scene understanding multimodal
9	Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation	提出Snake-NeRF框架，实现NeRF从局部到全局3D地球观测的扩展	NeRF neural radiance field

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
10	Kwai Keye-VL Technical Report	提出Kwai Keye-VL，用于提升多模态大模型在短视频理解上的性能。	reinforcement learning large language model foundation model
11	A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs	TANGERINE：用于肺癌筛查中胸部疾病检测的计算友好型开源基础模型	masked autoencoder foundation model
12	Robust brain age estimation from structural MRI with contrastive learning	提出对比学习以增强结构性MRI脑龄估计的鲁棒性	MAE contrastive learning foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页