Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety

📄 arXiv: 2508.03864v2 📥 PDF

作者: Zhenyu Pan, Yiting Zhang, Yutong Zhang, Jianshu Zhang, Haozheng Luo, Yuwei Han, Dennis Wu, Hong-Yu Chen, Philip S. Yu, Manling Li, Han Liu

分类: cs.AI

发布日期: 2025-08-05 (更新: 2025-09-06)

备注: accepted by the Trustworthy FMs workshop in ICCV 2025


💡 一句话要点

提出Evo-MARL以解决多智能体系统安全性问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 多智能体系统 强化学习 安全性 对抗攻击 演化搜索 鲁棒性 内化机制

📋 核心要点

  1. 现有多智能体系统的防御方法依赖外部安全代理,面临保护能力有限和单点故障的挑战。
  2. Evo-MARL通过训练每个智能体同时执行主要功能和抵御对抗威胁,内化安全机制,提升系统鲁棒性。
  3. 实验表明,Evo-MARL在攻击成功率上降低了22%,在推理任务上准确率提高了5%,实现安全性与效用的共同提升。

📝 摘要(中文)

基于多模态大语言模型的多智能体系统(MAS)展现出强大的协作能力和性能。然而,随着开放性和交互复杂性的增加,系统面临严重的安全风险,如越狱和对抗攻击。现有防御方法通常依赖外部安全模块,存在保护能力有限和单点故障的挑战。为此,本文提出Evo-MARL,一个新颖的多智能体强化学习框架,使所有任务智能体共同获得防御能力,内化安全机制,并在共同演化的威胁下持续提升MAS性能。实验结果表明,Evo-MARL将攻击成功率降低了22%,同时在推理任务上提高了5%的准确率,证明了安全性和效用的共同提升。

🔬 方法详解

问题定义:本文旨在解决多智能体系统在开放性和复杂交互下的安全性问题。现有方法依赖外部安全代理,导致保护能力有限且存在单点故障风险。

核心思路:Evo-MARL的核心思路是通过多智能体强化学习,使每个智能体在执行任务的同时具备防御能力,从而内化安全机制,避免依赖外部模块。

技术框架:Evo-MARL框架包括任务智能体和防御智能体的共同训练,结合演化搜索与参数共享的强化学习,形成攻击者与防御者的共演化过程。

关键创新:Evo-MARL的主要创新在于将防御能力内化到每个智能体中,避免了外部安全模块的局限性,提升了系统的整体鲁棒性。

关键设计:在设计中,采用了参数共享的强化学习策略,损失函数结合了任务性能与防御能力的权衡,确保智能体在面对对抗威胁时的有效性。整体架构支持动态调整,以适应不断变化的攻击模式。

📊 实验亮点

实验结果显示,Evo-MARL在对抗攻击下的成功率降低了22%,同时在推理任务的准确率上提高了5%。这一结果表明,Evo-MARL有效地实现了安全性与效用的双重提升,超越了现有的防御方法。

🎯 应用场景

Evo-MARL的研究成果在多个领域具有广泛的应用潜力,包括智能交通系统、金融交易监控和网络安全等。通过提升多智能体系统的安全性和鲁棒性,该框架能够有效应对复杂环境中的对抗性威胁,具有重要的实际价值和未来影响。

📄 摘要(原文)

Multi-agent systems (MAS) built on multimodal large language models exhibit strong collaboration and performance. However, their growing openness and interaction complexity pose serious risks, notably jailbreak and adversarial attacks. Existing defenses typically rely on external guard modules, such as dedicated safety agents, to handle unsafe behaviors. Unfortunately, this paradigm faces two challenges: (1) standalone agents offer limited protection, and (2) their independence leads to single-point failure-if compromised, system-wide safety collapses. Naively increasing the number of guard agents further raises cost and complexity. To address these challenges, we propose Evo-MARL, a novel multi-agent reinforcement learning (MARL) framework that enables all task agents to jointly acquire defensive capabilities. Rather than relying on external safety modules, Evo-MARL trains each agent to simultaneously perform its primary function and resist adversarial threats, ensuring robustness without increasing system overhead or single-node failure. Furthermore, Evo-MARL integrates evolutionary search with parameter-sharing reinforcement learning to co-evolve attackers and defenders. This adversarial training paradigm internalizes safety mechanisms and continually enhances MAS performance under co-evolving threats. Experiments show that Evo-MARL reduces attack success rates by up to 22% while boosting accuracy by up to 5% on reasoning tasks-demonstrating that safety and utility can be jointly improved.