cs.CV(2025-09-17)

📊 共 5 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
1 M-PACE: Mother Child Framework for Multimodal Compliance M-PACE:用于多模态合规性的母子框架,显著降低推理成本。 large language model multimodal
2 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR Baseer:面向阿拉伯语文档OCR的视觉-语言模型,刷新SOTA large language model multimodal
3 ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding ViSpec:利用视觉感知推测解码加速视觉-语言模型推理 large language model multimodal
4 Dense Video Understanding with Gated Residual Tokenization 提出门控残差Token化(GRT)框架,实现高效高帧率视频理解。 large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
5 A Generalization of CLAP from 3D Localization to Image Processing, A Connection With RANSAC & Hough Transforms CLAP算法泛化:从3D定位到图像拼接,并揭示其与RANSAC和Hough变换的联系 humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页