cs.CV(2025-09-17)

📊 共 5 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
1 M-PACE: Mother Child Framework for Multimodal Compliance M-PACE:用于多模态合规的母子框架,降低审核成本并提升效率 large language model multimodal
2 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR Baseer:面向阿拉伯语文档OCR的视觉-语言模型,显著提升识别精度。 large language model multimodal
3 ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding ViSpec:利用视觉感知推测解码加速视觉-语言模型 large language model multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
4 Dense Video Understanding with Gated Residual Tokenization 提出密集视频理解方法以解决低帧率采样问题 motion estimation large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
5 A Generalization of CLAP from 3D Localization to Image Processing, A Connection With RANSAC & Hough Transforms CLAP算法泛化:从3D定位到图像拼接,并揭示其与RANSAC和Hough变换的联系 humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页