cs.CV（2025-09-17）

📊 共 5 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
1	M-PACE: Mother Child Framework for Multimodal Compliance	M-PACE：用于多模态合规的母子框架，降低审核成本并提升效率	large language model multimodal
2	Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR	Baseer：面向阿拉伯语文档OCR的视觉-语言模型，显著提升识别精度。	large language model multimodal
3	ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding	ViSpec：利用视觉感知推测解码加速视觉-语言模型	large language model multimodal	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
4	Dense Video Understanding with Gated Residual Tokenization	提出密集视频理解方法以解决低帧率采样问题	motion estimation large language model

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
5	A Generalization of CLAP from 3D Localization to Image Processing, A Connection With RANSAC & Hough Transforms	CLAP算法泛化：从3D定位到图像拼接，并揭示其与RANSAC和Hough变换的联系	humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页