Queen-Bee Agents: A BeeSpec-Centered Architecture for Governed Enterprise MCP Orchestration
作者: Dutao Zhang, Liaotian
分类: cs.SE, cs.AI
发布日期: 2026-06-04
备注: Technical report. Prototype-level systems evidence; 59 enterprise-style tasks
💡 一句话要点
提出Queen-Bee架构以解决企业多代理系统治理问题
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 企业代理系统 多代理架构 治理机制 任务执行 模型上下文协议 知识检索 自动化工作流 租户隔离
📋 核心要点
- 现有企业代理系统在连接大型语言模型与内部工具时,面临政策执行和隔离等多重挑战。
- Queen-Bee架构通过控制平面检索能力并编译BeeSpec,实现了受限环境下的任务执行与治理。
- 实验结果表明,Queen-Bee在任务成功率上达到96.4%,且无治理失败,执行质量显著优于基线系统。
📝 摘要(中文)
企业代理系统日益需要将大型语言模型与私有工具、内部知识及模型上下文协议(MCP)接口连接。在此背景下,单纯的任务能力已不足以满足需求,组织还需要政策执行、租户范围隔离及在明确操作边界内的执行。本文提出了Queen-Bee,一个治理的多代理架构,其中Queen控制平面检索能力、规划任务执行并编译结构化的BeeSpec,由专业的Bee代理在受限工具访问下执行。我们实现了一个工作原型,评估了59个企业风格任务,结果显示该系统在任务成功率、治理失败率及执行质量上均优于基线系统。
🔬 方法详解
问题定义:本文旨在解决企业代理系统在连接大型语言模型与私有工具时,面临的政策执行、租户隔离及操作边界等问题。现有方法在这些方面存在不足,无法满足企业的治理需求。
核心思路:论文提出的Queen-Bee架构通过一个控制平面来检索能力、规划任务执行,并编译结构化的BeeSpec,以确保在受限工具访问下的有效执行。这样的设计能够在保证安全和治理的前提下,提升任务执行的效率和质量。
技术框架:Queen-Bee架构包含多个主要模块,包括控制平面、BeeSpec编译器和Bee代理。控制平面负责能力检索和任务规划,BeeSpec编译器将任务转化为可执行的BeeSpec,而Bee代理则在受限环境中执行这些任务。
关键创新:最重要的创新在于引入了治理机制和结构化的BeeSpec,使得多代理系统能够在复杂的企业环境中有效执行任务,同时确保政策的遵循和租户的隔离。这与现有方法的主要区别在于强调了治理和执行质量。
关键设计:在设计中,采用了租户范围的MCP连接、审计支持的执行治理,以及检索驱动的弱孵化策略。具体的参数设置和损失函数设计尚未详细披露,但整体架构的轻量化和结构化检索机制是其核心技术细节。
📊 实验亮点
实验结果显示,Queen-Bee变体在59个企业任务中实现了96.4%的任务成功率,且无治理失败,执行质量显著优于静态基线和单代理基线。这表明该架构在治理敏感请求和局部执行方面具有显著优势。
🎯 应用场景
该研究的潜在应用领域包括企业级智能代理系统、自动化工作流管理及知识检索系统。通过提供有效的治理和执行机制,Queen-Bee架构能够帮助企业在复杂环境中实现高效的任务管理,提升工作效率和决策质量,未来可能对企业数字化转型产生深远影响。
📄 摘要(原文)
Enterprise agent systems increasingly need to connect large language models to private tools, internal knowledge, and Model Context Protocol (MCP) interfaces. In this setting, raw task capability is insufficient: organizations also require policy enforcement, tenant-scoped isolation, and execution that remains within explicit operational boundaries. We present Queen-Bee, a governed multi-agent architecture in which a Queen control plane retrieves capabilities, plans task-scoped execution, and compiles a structured BeeSpec that is executed by specialized Bee agents under constrained tool access. We implement a working prototype with tenant-scoped MCP connectors, audit-backed execution-time governance, retrieval-driven weak incubation, and multiple provisioning backends. We evaluate the system on 59 enterprise-style tasks spanning governance-sensitive requests, retrieval-driven provisioning, scoped local execution, and chemistry workflow integration. The retrieval-driven Queen-Bee variant achieves a task success rate of 0.964, zero governance failures, and substantially better scoped execution quality than both a static Queen-Bee baseline and a permissive single-agent baseline. We further show a multi-Bee chemistry workflow with explicit approval gating and a concrete top-3 shortlist grounded in real upstream evidence and screening artifacts. Additional comparisons with hybrid retrieval and LLM-guided provisioning show that richer provisioning backends are viable but do not outperform the lightweight structured retriever on the current small, highly structured capability registry. The results provide prototype-level systems evidence rather than a production deployment study, and suggest that enterprise agent platforms should be evaluated not only by capability, but also by governed provisioning, isolation behavior, scoped execution quality, and artifact-aware workflow coordination.