Scalable Trajectory Generation for Whole-Body Mobile Manipulation

作者: Yida Niu, Xinhai Chang, Xin Liu, Ziyuan Jiao, Yixin Zhu

分类: cs.RO, cs.CV

发布日期: 2026-04-14

💡 一句话要点

AutoMoMa：一种可扩展的全身移动操作轨迹生成框架，加速机器人学习。

🎯 匹配领域: 支柱一：机器人控制 (Robot Control)

关键词: 移动操作 轨迹生成 模仿学习 机器人学习 GPU加速

📋 核心要点

现有移动操作方法在复杂环境中难以生成大规模、物理有效的轨迹数据，阻碍了机器人学习。
AutoMoMa通过统一AKR建模和并行轨迹优化，实现了高效的全身运动轨迹生成。
实验表明，AutoMoMa生成的数据集能够有效训练模仿学习策略，提升移动操作任务的成功率。

📝 摘要（中文）

本文提出AutoMoMa，一个GPU加速的框架，用于生成大规模、物理上有效的全身移动操作轨迹数据。AutoMoMa统一了AKR建模（将底座、手臂和物体运动学整合到单个链中）与并行化轨迹优化。该框架在每个GPU小时内可生成5,000个episode（比基于CPU的基线快80倍以上），从而生成包含超过50万条轨迹的数据集，涵盖330个场景、各种铰接物体和多个机器人形态。下游模仿学习策略的训练表明，即使是单个铰接物体任务也需要数万次演示才能使SOTA方法达到约80%的成功率，证实了数据稀缺而非算法限制是瓶颈。AutoMoMa弥合了高性能规划和可靠的基于模仿学习的控制之间的差距，为协调移动操作研究提供了基础设施。

🔬 方法详解

问题定义：现有移动操作方法，如遥操作和规划，在生成大规模、物理有效的轨迹数据方面面临挑战。遥操作成本高昂，规划方法计算量大，难以扩展到复杂环境和多样化的机器人形态。这导致数据稀缺，限制了基于模仿学习的控制策略的性能。

核心思路：AutoMoMa的核心思路是通过统一运动学表示和并行化轨迹优化，实现高效的轨迹生成。具体来说，它将底座、手臂和物体运动学整合到单个AKR链中，简化了运动学建模。同时，利用GPU并行计算能力加速轨迹优化过程，显著提高了数据生成效率。

技术框架：AutoMoMa框架包含以下主要模块：1) AKR建模：将机器人底座、手臂和操作对象整合为一个运动学链。2) 并行轨迹优化：利用GPU并行计算能力，优化机器人轨迹，确保轨迹的物理有效性。3) 数据集生成：将生成的轨迹数据存储为数据集，用于训练下游模仿学习策略。

关键创新：AutoMoMa的关键创新在于将AKR建模与并行轨迹优化相结合，实现了大规模、高效的全身移动操作轨迹生成。与现有方法相比，AutoMoMa能够同时满足数据规模、多样性和运动学有效性三个方面的要求。

关键设计：AutoMoMa的关键设计包括：1) AKR建模方法，简化了复杂机器人的运动学表示。2) 基于GPU的并行轨迹优化算法，加速了轨迹生成过程。3) 针对特定任务设计的奖励函数，引导轨迹优化过程。

🖼️ 关键图片

📊 实验亮点

AutoMoMa在每个GPU小时内可生成5,000个episode，比基于CPU的基线快80倍以上。利用AutoMoMa生成的数据集训练的模仿学习策略，在铰接物体操作任务中，即使是单个任务也需要数万次演示才能使SOTA方法达到约80%的成功率，验证了数据规模对移动操作学习的重要性。

🎯 应用场景

AutoMoMa为移动操作领域提供了一种高效的数据生成工具，可应用于家庭服务机器人、物流机器人、工业机器人等领域。通过生成大规模、高质量的训练数据，可以提升机器人在非结构化环境中执行复杂操作任务的能力，例如物体抓取、放置、组装等。该研究有助于推动机器人技术在实际场景中的应用。

📄 摘要（原文）

Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than those sufficient for fixed-base manipulation. Yet existing acquisition methods, including teleoperation and planning, are either labor-intensive or computationally prohibitive at scale. The core bottleneck is the lack of a scalable pipeline for generating large-scale, physically valid, coordinated trajectory data across diverse embodiments and environments. Here we introduce AutoMoMa, a GPU-accelerated framework that unifies AKR modeling, which consolidates base, arm, and object kinematics into a single chain, with parallelized trajectory optimization. AutoMoMa achieves 5,000 episodes per GPU-hour (over $80\times$ faster than CPU-based baselines), producing a dataset of over 500k physically valid trajectories spanning 330 scenes, diverse articulated objects, and multiple robot embodiments. Prior datasets were forced to compromise on scale, diversity, or kinematic fidelity; AutoMoMa addresses all three simultaneously. Training downstream IL policies further reveals that even a single articulated-object task requires tens of thousands of demonstrations for SOTA methods to reach $\approx 80\%$ success, confirming that data scarcity -- not algorithmic limitations -- has been the binding constraint. AutoMoMa thus bridges high-performance planning and reliable IL-based control, providing the infrastructure previously missing for coordinated mobile manipulation research. By making large-scale, kinematically valid training data practical, AutoMoMa showcases generalizable whole-body robot policies capable of operating in the diverse, unstructured settings of the real world.

Scalable Trajectory Generation for Whole-Body Mobile Manipulation

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理