cs.CV(2025-03-17)

📊 共 6 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
1 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research 提出MicroVQA:一个用于评估显微镜图像多模态推理能力的基准数据集。 large language model multimodal chain-of-thought
2 Exploring 3D Reasoning-Driven Planning: From Implicit Human Intentions to Route-Aware Activity Planning 提出3D推理驱动规划,解决具身智能中隐式意图理解和路径规划问题 embodied AI multimodal
3 Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning 提出mmSSR方法,用于高效、可扩展地筛选高质量、多样化的多模态指令微调数据。 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
4 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs MM-Spatial:探索多模态LLM中的3D空间理解能力 depth estimation monocular depth metric depth
5 NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models 提出NuPlanQA数据集与BEV-LLM模型,提升多模态大语言模型在自动驾驶场景理解能力。 scene understanding large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
6 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding DeepPerception:提升MLLM在知识密集型视觉定位中的认知视觉感知能力 reinforcement learning large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页