DaDu-Corki: Algorithm-Architecture Co-Design for Embodied AI-powered Robotic Manipulation

📄 arXiv: 2407.04292v5 📥 PDF

作者: Yiyang Huang, Yuhui Hao, Bo Yu, Feng Yan, Yuxin Yang, Feng Min, Yinhe Han, Lin Ma, Shaoshan Liu, Qiang Liu, Yiming Gan

分类: cs.AR, cs.RO

发布日期: 2024-07-05 (更新: 2025-06-08)

DOI: 10.1145/3695053.3731099


💡 一句话要点

提出Corki框架以解决机器人操控中的延迟和能耗问题

🎯 匹配领域: 支柱一:机器人控制 (Robot Control) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 具身人工智能 机器人操控 算法-架构协同设计 大型语言模型 实时系统

📋 核心要点

  1. 现有的机器人操控系统在算法设计上过于依赖单帧动作,导致高延迟和能耗。
  2. Corki框架通过解耦大型语言模型推理、机器人控制和数据通信,优化了计算管道。
  3. 实验结果表明,Corki将LLM推理频率降低了5.1倍,速度提升了5.9倍,成功率提高了13.9%。

📝 摘要(中文)

具身人工智能机器人有潜力根本改善人类的生活和制造方式。利用大型语言模型控制机器人的持续进展依赖于高效的计算基础设施。现有的机器人操控计算系统主要基于算法开发者的需求,导致高延迟和能耗。本文提出了Corki,一个算法-架构协同设计框架,旨在实时操控具身AI机器人。Corki通过预测未来轨迹而非单帧动作,减少了大型语言模型的推理频率,从而显著提高了速度和成功率。

🔬 方法详解

问题定义:本文旨在解决现有机器人操控系统中因单帧动作预测导致的高延迟和能耗问题。现有方法未能有效利用计算资源,影响了实时操控的性能。

核心思路:Corki框架的核心思想是通过预测未来轨迹而非单一动作,减少大型语言模型的推理频率,从而提高系统的响应速度和能效。

技术框架:Corki框架包括三个主要模块:1) LLM推理模块,负责轨迹预测;2) 硬件加速模块,将预测轨迹转化为控制信号;3) 数据通信与计算并行执行的执行管道。

关键创新:Corki的创新在于算法与硬件的协同设计,显著降低了LLM推理频率,并通过并行化数据通信与计算,提升了整体性能。

关键设计:在设计中,Corki采用了特定的损失函数来优化轨迹预测,同时硬件加速模块的设计确保了控制信号的快速生成,支持实时操控需求。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,Corki框架将大型语言模型的推理频率降低了5.1倍,整体速度提升了5.9倍,成功率提高了13.9%。这些结果表明Corki在实时机器人操控中的有效性和优越性。

🎯 应用场景

该研究的潜在应用场景包括智能制造、自动化仓储和服务机器人等领域。通过提高机器人操控的实时性和效率,Corki框架能够显著提升生产力和操作安全性,未来可能在更多具身AI应用中发挥重要作用。

📄 摘要(原文)

Embodied AI robots have the potential to fundamentally improve the way human beings live and manufacture. Continued progress in the burgeoning field of using large language models to control robots depends critically on an efficient computing substrate, and this trend is strongly evident in manipulation tasks. In particular, today's computing systems for embodied AI robots for manipulation tasks are designed purely based on the interest of algorithm developers, where robot actions are divided into a discrete frame basis. Such an execution pipeline creates high latency and energy consumption. This paper proposes \textsc{Corki}\xspace, an algorithm-architecture co-design framework for real-time embodied AI-powered robotic manipulation applications. We aim to decouple LLM inference, robotic control, and data communication in the embodied AI robots' compute pipeline. Instead of predicting action for one single frame, \textsc{Corki}\xspace predicts the trajectory for the near future to reduce the frequency of LLM inference. The algorithm is coupled with a hardware that accelerates transforming trajectory into actual torque signals used to control robots and an execution pipeline that parallels data communication with computation. \textsc{Corki}\xspace largely reduces LLM inference frequency by up to $5.1\times$, resulting in up to $5.9\times$ speed up. The success rate improvement can be up to 13.9\%.