cs.CV(2025-09-17)
📊 共 5 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | M-PACE: Mother Child Framework for Multimodal Compliance | M-PACE:用于多模态合规的母子框架,降低审核成本并提升效率 | large language model multimodal | ||
| 2 | Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR | Baseer:面向阿拉伯语文档OCR的视觉-语言模型,显著提升识别精度。 | large language model multimodal | ||
| 3 | ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding | ViSpec:利用视觉感知推测解码加速视觉-语言模型 | large language model multimodal | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | Dense Video Understanding with Gated Residual Tokenization | 提出密集视频理解方法以解决低帧率采样问题 | motion estimation large language model |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | A Generalization of CLAP from 3D Localization to Image Processing, A Connection With RANSAC & Hough Transforms | CLAP算法泛化:从3D定位到图像拼接,并揭示其与RANSAC和Hough变换的联系 | humanoid |