MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning

作者: Zikang Guo, Benfeng Xu, Xiaorui Wang, Zhendong Mao

分类: cs.AI

发布日期: 2025-05-27 (更新: 2025-06-05)

备注: Accepted to 34rd International Joint Conference on Artificial Intelligence (IJCAI 2025)

💡 一句话要点

提出MIRROR框架以优化工具学习中的多智能体反思机制

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 多智能体系统 工具学习 反思机制 决策优化 大型语言模型

📋 核心要点

现有方法在复杂工具集成任务中面临显著挑战，尤其是在错误轨迹的纠正方面。
MIRROR框架通过引入行动前的内部反思和基于观察的外部反思，全面提升了智能体的决策质量。
在StableToolBench和TravelPlanner基准测试中，MIRROR表现优异，超越了现有的最先进方法。

📝 摘要（中文）

复杂的工具集成任务对大型语言模型（LLMs）提出了重大挑战，促使多智能体工作流成为一种有前景的解决方案。反思被认为是纠正智能体工作流中错误轨迹的有效策略，但现有方法仅在行动后阶段利用这一能力。本文提出MIRROR框架，包含行动前的内部反思和基于观察的外部反思，系统性地利用LLMs的反思能力，从而在更广泛的范围内消除和纠正错误行动。通过在StableToolBench和TravelPlanner基准上的评估，MIRROR展现出优越的性能，达到了现有方法的最先进结果。

🔬 方法详解

问题定义：本文旨在解决复杂工具集成任务中，现有多智能体工作流在错误轨迹纠正方面的不足，尤其是仅在行动后进行反思的问题。

核心思路：MIRROR框架的核心在于引入内部反思机制，使智能体在执行行动前能够预见潜在的不良结果，从而优化决策过程。

技术框架：MIRROR框架分为两个主要模块：内部反思模块在行动前评估意图，外部反思模块在行动后根据观察调整轨迹。

关键创新：MIRROR的创新在于同时利用行动前和行动后的反思机制，系统性地消除和纠正错误行动，与现有方法相比，提供了更全面的反思能力。

关键设计：在设计中，MIRROR采用了特定的损失函数来评估反思效果，并结合了多层次的网络结构以增强模型的学习能力。通过这些设计，MIRROR能够更有效地处理复杂的决策任务。

🖼️ 关键图片

📊 实验亮点

在实验中，MIRROR在StableToolBench和TravelPlanner基准上取得了显著的性能提升，超越了现有最先进方法，具体表现为在任务成功率和决策准确性上均有明显提高，展示了其在复杂任务中的有效性和可靠性。

🎯 应用场景

MIRROR框架在机器人操作、自动化工具使用和复杂任务规划等领域具有广泛的应用潜力。其优化的反思机制能够提高智能体在动态环境中的决策能力，进而提升整体工作效率和安全性。未来，该框架可能推动更复杂的多智能体系统的发展，促进人机协作的智能化。

📄 摘要（原文）

Complex tasks involving tool integration pose significant challenges for Large Language Models (LLMs), leading to the emergence of multi-agent workflows as a promising solution. Reflection has emerged as an effective strategy for correcting erroneous trajectories in agentic workflows. However, existing approaches only exploit such capability in the post-action stage, where the agent observes the execution outcomes. We argue that, like humans, LLMs can also engage in reflection before action execution: the agent can anticipate undesirable outcomes from its own decisions, which not only provides a necessarily complementary perspective to evaluate the decision but also prevents the propagation of errors throughout the trajectory. In this paper, we propose MIRROR, a framework that consists of both intra-reflection, which critically assesses intended actions before execution, and inter-reflection, which further adjusts the trajectory based on observations. This design systematically leverages LLM reflection capabilities to eliminate and rectify erroneous actions on a more comprehensive scope. Evaluations on both the StableToolBench and TravelPlanner benchmarks demonstrate MIRROR's superior performance, achieving state-of-the-art results compared to existing approaches.

MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理