CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs

作者: Bruce Yang, Xinfeng He, Huan Gao, Yifan Cao, Xiaofan Li, David Hsu

分类: cs.AI

发布日期: 2025-07-04

💡 一句话要点

CodeAgents：一种用于LLM中高效多智能体推理的Token高效框架

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 多智能体系统 大语言模型 提示工程 代码生成 token效率

📋 核心要点

现有结构化提示策略通常局限于单智能体、纯计划场景，且仅基于任务准确性评估性能，忽略了多智能体环境中的token效率、模块化和可扩展性。
CodeAgents框架将多智能体推理编码为模块化的伪代码，包含控制结构、布尔逻辑和类型变量，从而将松散的智能体计划转化为连贯、可解释和可验证的程序。
实验结果表明，CodeAgents在GAIA、HotpotQA和VirtualHome等基准测试中，规划性能均得到提升，且显著降低了token使用量，并在VirtualHome上取得了新的SOTA。

📝 摘要（中文）

本文提出CodeAgents，一个提示框架，旨在对多智能体推理进行编码，并在多智能体系统中实现结构化、token高效的规划。在CodeAgents中，智能体交互的所有组成部分——任务、计划、反馈、系统角色和外部工具调用——都被编码成模块化的伪代码，并辅以控制结构（例如，循环、条件）、布尔逻辑和类型变量。这种设计将松散连接的智能体计划转换为有凝聚力、可解释和可验证的多智能体推理程序。在GAIA、HotpotQA和VirtualHome三个基准测试中，使用一系列代表性的LLM对所提出的框架进行了评估。结果表明，规划性能得到了持续的提高，与自然语言提示基线相比，绝对增益为3-36个百分点。在VirtualHome上，该方法实现了56%的最新成功率。此外，该方法分别减少了55-87%和41-70%的输入和输出token使用量，突出了在可扩展的多智能体LLM系统开发中，token感知评估指标的重要性。代码和资源可在https://anonymous.4open.science/r/CodifyingAgent-5A86获得。

🔬 方法详解

问题定义：现有的大语言模型驱动的智能体在多智能体环境下的规划能力不足，尤其是在token效率、模块化和可扩展性方面存在挑战。现有的结构化提示方法通常只关注单智能体环境，并且忽略了token消耗等重要因素。

核心思路：CodeAgents的核心思路是将多智能体交互过程中的各个组成部分（任务、计划、反馈、角色、工具调用）编码为模块化的伪代码。通过引入控制结构（循环、条件）、布尔逻辑和类型变量，将原本松散的智能体计划转化为结构化的、可解释的程序。

技术框架：CodeAgents框架的核心在于使用伪代码来描述智能体之间的交互。该框架包含以下主要模块：1) 任务定义：明确每个智能体的任务目标。2) 计划生成：每个智能体根据任务目标生成行动计划，用伪代码表示。3) 反馈机制：智能体之间通过反馈信息进行协调和调整。4) 工具调用：智能体可以调用外部工具来辅助完成任务。整个流程通过伪代码的执行和智能体之间的交互不断迭代，直到完成任务。

关键创新：CodeAgents的关键创新在于将多智能体推理过程显式地编码为伪代码程序。与传统的自然语言提示方法相比，这种方法更加结构化、可解释，并且能够有效地控制token的使用量。此外，模块化的设计使得CodeAgents具有良好的可扩展性，可以方便地应用于不同的多智能体任务。

关键设计：CodeAgents使用伪代码来描述智能体的行为和交互，伪代码中包含控制结构（如if-else、for循环）、布尔逻辑和类型变量。具体的参数设置和损失函数等细节取决于具体的应用场景和任务需求。框架的设计目标是最小化token使用量，同时保证智能体的推理能力。

🖼️ 关键图片

📊 实验亮点

CodeAgents在GAIA、HotpotQA和VirtualHome三个基准测试中均取得了显著的性能提升。在VirtualHome上，CodeAgents实现了56%的成功率，达到了新的SOTA。此外，CodeAgents还显著降低了token的使用量，输入token减少了55-87%，输出token减少了41-70%。这些结果表明，CodeAgents是一种高效且有效的多智能体推理框架。

🎯 应用场景

CodeAgents框架可应用于各种需要多智能体协作的场景，例如：自动化客服、智能家居控制、协同机器人、以及复杂的决策支持系统。通过提高多智能体系统的规划能力和token效率，该研究有助于构建更智能、更高效的AI应用，并降低部署和运行成本。

📄 摘要（原文）

Effective prompt design is essential for improving the planning capabilities of large language model (LLM)-driven agents. However, existing structured prompting strategies are typically limited to single-agent, plan-only settings, and often evaluate performance solely based on task accuracy - overlooking critical factors such as token efficiency, modularity, and scalability in multi-agent environments. To address these limitations, we introduce CodeAgents, a prompting framework that codifies multi-agent reasoning and enables structured, token-efficient planning in multi-agent systems. In CodeAgents, all components of agent interaction - Task, Plan, Feedback, system roles, and external tool invocations - are codified into modular pseudocode enriched with control structures (e.g., loops, conditionals), boolean logic, and typed variables. This design transforms loosely connected agent plans into cohesive, interpretable, and verifiable multi-agent reasoning programs. We evaluate the proposed framework across three diverse benchmarks - GAIA, HotpotQA, and VirtualHome - using a range of representative LLMs. Results show consistent improvements in planning performance, with absolute gains of 3-36 percentage points over natural language prompting baselines. On VirtualHome, our method achieves a new state-of-the-art success rate of 56%. In addition, our approach reduces input and output token usage by 55-87% and 41-70%, respectively, underscoring the importance of token-aware evaluation metrics in the development of scalable multi-agent LLM systems. The code and resources are available at: https://anonymous.4open.science/r/CodifyingAgent-5A86

CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理