cs.CV（2025-03-17）

📊 共 6 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
1	MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	提出MicroVQA：一个用于评估显微镜图像多模态推理能力的基准数据集。	large language model multimodal chain-of-thought	✅
2	Exploring 3D Reasoning-Driven Planning: From Implicit Human Intentions to Route-Aware Activity Planning	提出3D推理驱动规划，解决具身智能中隐式意图理解和路径规划问题	embodied AI multimodal
3	Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning	提出mmSSR方法，用于高效、可扩展地筛选高质量、多样化的多模态指令微调数据。	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
4	MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs	MM-Spatial：探索多模态LLM中的3D空间理解能力	depth estimation monocular depth metric depth
5	NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models	提出NuPlanQA数据集与BEV-LLM模型，提升多模态大语言模型在自动驾驶场景理解能力。	scene understanding large language model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
6	DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding	DeepPerception：提升MLLM在知识密集型视觉定位中的认知视觉感知能力	reinforcement learning large language model multimodal	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页