cs.CV(2025-07-02)
📊 共 12 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱二:RL算法与架构 (RL & Architecture) (3)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | 评估GPT-4o等多模态模型在标准计算机视觉任务上的性能 | foundation model multimodal | ||
| 2 | Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges | 综述:利用大型语言模型进行视频碰撞检测的方法、数据集与挑战 | large language model foundation model multimodal | ||
| 3 | ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning | 提出ESTR-CoT框架,利用思维链推理提升事件流场景文本识别的准确性和可解释性 | large language model chain-of-thought | ✅ | |
| 4 | Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning | 提出ReasonBrain框架与Reason50K数据集,解决基于推理的图像编辑难题 | large language model multimodal | ||
| 5 | What does really matter in image goal navigation? | 提出端到端强化学习方法以解决图像目标导航问题 | embodied AI |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Underwater Monocular Metric Depth Estimation: Real-World Benchmarks and Synthetic Fine-Tuning with Vision Foundation Models | 提出水下单目深度估计基准,并用视觉基础模型进行合成数据微调。 | depth estimation monocular depth metric depth | ||
| 7 | Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach | 提出一种零样本多模态复合表情识别方法,无需目标数据训练即可实现高性能。 | scene understanding multimodal | ||
| 8 | ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving | W-CODA研讨会旨在通过多模态感知与理解技术,探索自动驾驶Corner Case的下一代解决方案。 | scene understanding multimodal | ||
| 9 | Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation | 提出Snake-NeRF框架,实现NeRF从局部到全局3D地球观测的扩展 | NeRF neural radiance field |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | Kwai Keye-VL Technical Report | 提出Kwai Keye-VL,用于提升多模态大模型在短视频理解上的性能。 | reinforcement learning large language model foundation model | ||
| 11 | A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs | TANGERINE:用于肺癌筛查中胸部疾病检测的计算友好型开源基础模型 | masked autoencoder foundation model | ||
| 12 | Robust brain age estimation from structural MRI with contrastive learning | 提出对比学习以增强结构性MRI脑龄估计的鲁棒性 | MAE contrastive learning foundation model |