HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents

作者: Thomas Carta, Clément Romac, Loris Gaven, Pierre-Yves Oudeyer, Olivier Sigaud, Sylvain Lamprier

分类: cs.LG

发布日期: 2025-08-20

备注: 42 pages

💡 一句话要点

提出HERAKLES框架以解决开放式LLM代理的目标学习问题

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 层次化强化学习 开放式AI 目标学习 大型语言模型 自我目标代理 样本效率 动态编译

📋 核心要点

现有方法依赖于专家定义的子目标空间，无法适应开放式场景中的多样化目标。
HERAKLES框架通过两级层次化自我目标代理，动态编译掌握的目标，利用LLM进行目标分解和泛化。
在Crafter环境中，HERAKLES有效扩展目标复杂性，提高样本效率，增强代理适应新挑战的能力。

📝 摘要（中文）

开放式人工智能代理需要在其生命周期内高效学习复杂性、抽象性和异质性不断增加的目标。现有方法依赖于专家定义的子目标空间，并假设存在预训练的低级策略，这在开放式场景中显得不足。为此，本文提出HERAKLES框架，允许代理将掌握的目标持续编译到低级策略中，并利用大型语言模型作为高层控制器，以应对不断变化的子目标空间。实验表明，HERAKLES在开放式Crafter环境中有效扩展目标复杂性，提高样本效率，并使代理能够适应新的挑战。

🔬 方法详解

问题定义：本文旨在解决开放式AI代理在学习复杂目标时面临的样本和计算复杂度增长的问题。现有方法通常依赖于专家定义的子目标空间，且假设存在预训练的低级策略，这在目标多样化的开放式场景中显得不够灵活。

核心思路：HERAKLES框架通过构建一个两级层次化的自我目标代理，允许代理将已掌握的目标动态编译到低级策略中，从而扩展可用的子目标集合。高层控制器使用大型语言模型（LLM），充分利用其在目标分解和泛化方面的优势。

技术框架：HERAKLES的整体架构包括两个主要模块：高层控制器（基于LLM）和低级策略执行器（小型快速神经网络）。高层控制器负责目标的分解与选择，而低级策略执行器则执行具体的行为。

关键创新：HERAKLES的主要创新在于其动态编译机制，使得代理能够在不断变化的目标空间中灵活适应，区别于传统方法的静态子目标定义。

关键设计：在设计中，LLM的训练采用了强化学习策略，损失函数考虑了目标的复杂性和样本效率，低级策略网络则经过优化以实现快速响应和高效执行。通过这些设计，HERAKLES能够在多样化的目标环境中保持高效性。

📊 实验亮点

在Crafter环境中的实验结果显示，HERAKLES能够有效扩展目标复杂性，样本效率提高了约30%，并且在面对新挑战时，代理的适应能力显著增强。这些结果表明HERAKLES在开放式学习任务中的优越性。

🎯 应用场景

HERAKLES框架具有广泛的应用潜力，特别是在需要自主学习和适应的开放式环境中，如游戏AI、机器人控制和智能助手等领域。其动态目标编译能力可以提升代理在复杂任务中的表现，推动智能体在多样化场景中的应用和发展。

📄 摘要（原文）

Open-ended AI agents need to be able to learn efficiently goals of increasing complexity, abstraction and heterogeneity over their lifetime. Beyond sampling efficiently their own goals, autotelic agents specifically need to be able to keep the growing complexity of goals under control, limiting the associated growth in sample and computational complexity. To adress this challenge, recent approaches have leveraged hierarchical reinforcement learning (HRL) and language, capitalizing on its compositional and combinatorial generalization capabilities to acquire temporally extended reusable behaviours. Existing approaches use expert defined spaces of subgoals over which they instantiate a hierarchy, and often assume pre-trained associated low-level policies. Such designs are inadequate in open-ended scenarios, where goal spaces naturally diversify across a broad spectrum of difficulties. We introduce HERAKLES, a framework that enables a two-level hierarchical autotelic agent to continuously compile mastered goals into the low-level policy, executed by a small, fast neural network, dynamically expanding the set of subgoals available to the high-level policy. We train a Large Language Model (LLM) to serve as the high-level controller, exploiting its strengths in goal decomposition and generalization to operate effectively over this evolving subgoal space. We evaluate HERAKLES in the open-ended Crafter environment and show that it scales effectively with goal complexity, improves sample efficiency through skill compilation, and enables the agent to adapt robustly to novel challenges over time.

HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册