KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems

📄 arXiv: 2409.14908v2 📥 PDF

作者: Zixuan Wang, Bo Yu, Junzhe Zhao, Wenhao Sun, Sai Hou, Shuai Liang, Xing Hu, Yinhe Han, Yiming Gan

分类: cs.RO, cs.AI

发布日期: 2024-09-23 (更新: 2025-03-21)

🔗 代码/项目: GITHUB


💡 一句话要点

提出KARMA以解决具身AI代理的记忆不足问题

🎯 匹配领域: 支柱一:机器人控制 (Robot Control) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 具身AI 长短期记忆 任务规划 记忆增强 机器人技术

📋 核心要点

  1. 现有的具身AI代理在执行长序列的家庭任务时,面临上下文记忆不足的问题,导致效率低下和错误频发。
  2. KARMA通过整合长短期记忆模块,增强大型语言模型的规划能力,使代理能够更好地处理复杂任务。
  3. 实验结果显示,KARMA在复合任务和复杂任务中的成功率分别提高了1.3倍和2.3倍,任务执行效率显著提升。

📝 摘要(中文)

具身AI代理在执行复杂的家庭任务时,常常面临上下文记忆不足的问题,导致任务执行效率低下和错误。为了解决这一问题,本文提出了KARMA,一个创新的记忆系统,结合了长短期记忆模块,通过增强记忆的提示方式提升大型语言模型(LLMs)在具身代理中的规划能力。KARMA区分长短期记忆,长时记忆捕捉环境的全面3D场景图,而短时记忆动态记录物体位置和状态的变化。这种双重记忆结构使代理能够检索相关的过去场景经验,从而提高任务规划的准确性和效率。与现有的增强记忆的具身代理相比,KARMA在AI2-THOR模拟器中的复合任务和复杂任务成功率分别提高了1.3倍和2.3倍,任务执行效率提升了3.4倍和62.7倍。

🔬 方法详解

问题定义:本文旨在解决具身AI代理在执行长序列家庭任务时的上下文记忆不足问题,现有方法在任务执行中常常出现效率低下和错误。

核心思路:KARMA通过引入长短期记忆模块,分别捕捉环境的长期信息和动态变化,增强了代理的记忆能力,从而提高任务规划的准确性和效率。

技术框架:KARMA的整体架构包括长时记忆模块和短时记忆模块,长时记忆负责存储环境的3D场景图,短时记忆则记录物体状态的变化,二者协同工作以优化任务执行。

关键创新:KARMA的主要创新在于其双重记忆结构,能够有效区分和管理长短期记忆,与现有方法相比,显著提升了具身代理的任务执行能力。

关键设计:短时记忆采用有效的记忆替换策略,确保重要信息的保留,同时丢弃不相关的数据,优化了记忆的使用效率。

📊 实验亮点

实验结果表明,KARMA在AI2-THOR模拟器中,复合任务成功率提高了1.3倍,复杂任务成功率提高了2.3倍,任务执行效率分别提升了3.4倍和62.7倍,显示出其在具身AI领域的显著优势。

🎯 应用场景

KARMA的研究成果具有广泛的应用潜力,尤其在家庭机器人、服务机器人和智能家居系统中,能够显著提升机器人在复杂环境中的任务执行能力。未来,KARMA的记忆系统可扩展到其他领域,如自动驾驶、工业自动化等,推动智能系统的进一步发展。

📄 摘要(原文)

Embodied AI agents responsible for executing interconnected, long-sequence household tasks often face difficulties with in-context memory, leading to inefficiencies and errors in task execution. To address this issue, we introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules, enhancing large language models (LLMs) for planning in embodied agents through memory-augmented prompting. KARMA distinguishes between long-term and short-term memory, with long-term memory capturing comprehensive 3D scene graphs as representations of the environment, while short-term memory dynamically records changes in objects' positions and states. This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning. Short-term memory employs strategies for effective and adaptive memory replacement, ensuring the retention of critical information while discarding less pertinent data. Compared to state-of-the-art embodied agents enhanced with memory, our memory-augmented embodied AI agent improves success rates by 1.3x and 2.3x in Composite Tasks and Complex Tasks within the AI2-THOR simulator, respectively, and enhances task execution efficiency by 3.4x and 62.7x. Furthermore, we demonstrate that KARMA's plug-and-play capability allows for seamless deployment on real-world robotic systems, such as mobile manipulation platforms.Through this plug-and-play memory system, KARMA significantly enhances the ability of embodied agents to generate coherent and contextually appropriate plans, making the execution of complex household tasks more efficient. The experimental videos from the work can be found at https://youtu.be/4BT7fnw9ehs. Our code is available at https://github.com/WZX0Swarm0Robotics/KARMA/tree/master.