Skill Retrieval Augmentation for Agentic AI

作者: Weihang Su, Jianming Long, Qingyao Ai, Yichen Tang, Changyue Wang, Yiteng Tu, Yiqun Liu

分类: cs.CL, cs.AI

发布日期: 2026-04-27

💡 一句话要点

提出技能检索增强（SRA）范式，解决Agentic AI中技能扩展瓶颈问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: Agentic AI 技能检索增强 大型语言模型 外部知识 技能整合

📋 核心要点

现有Agent系统依赖上下文窗口枚举技能，面临技能语料库扩展带来的上下文预算限制和准确性下降问题。
提出技能检索增强（SRA）范式，Agent按需动态检索、整合和应用外部技能，解决技能扩展瓶颈。
构建SRA-Bench基准测试，实验表明基于检索的技能增强能显著提升Agent性能，但技能整合仍有提升空间。

📝 摘要（中文）

随着大型语言模型（LLMs）发展为Agentic问题解决者，它们越来越依赖外部可重用技能来处理超出其原生参数能力的任务。现有Agent系统中，整合技能的主要策略是在上下文窗口中显式枚举可用技能。然而，这种策略无法扩展：随着技能语料库的扩展，上下文预算迅速消耗，Agent识别正确技能的准确性显著降低。为此，本文提出了技能检索增强（SRA），这是一种新的范式，Agent可以按需动态检索、整合和应用来自大型外部技能语料库的相关技能。为了使这个问题可衡量，我们构建了一个大规模技能语料库，并引入了SRA-Bench，这是第一个用于全面评估SRA管道的基准，涵盖技能检索、技能整合和端到端任务执行。SRA-Bench包含5,400个能力密集型测试实例和636个手动构建的黄金技能，这些技能与网络收集的干扰技能混合，形成一个包含26,262个技能的大规模语料库。大量实验表明，基于检索的技能增强可以显著提高Agent的性能，验证了该范式的潜力。同时，我们发现技能整合方面存在根本差距：当前的LLM Agent倾向于以相似的速率加载技能，无论是否检索到黄金技能，或者任务是否实际需要外部能力。这表明技能增强的瓶颈不仅在于检索，还在于基础模型确定加载哪个技能以及何时实际需要外部加载的能力。这些发现将SRA定位为一个独特的研究问题，并为未来Agent系统中能力的可扩展增强奠定了基础。

🔬 方法详解

问题定义：现有Agent系统在处理复杂任务时，依赖于外部技能。然而，当技能库规模增大时，将所有技能信息都放入LLM的上下文窗口会导致性能下降，因为LLM的上下文长度有限，且在大量信息中找到相关技能的难度增加。因此，如何有效地从大规模技能库中选择并利用相关技能是亟待解决的问题。

核心思路：论文的核心思路是引入检索机制，让Agent在需要时从大规模技能库中检索相关技能，而不是将所有技能都放入上下文窗口。这种方法可以有效地利用外部知识，同时避免上下文长度限制带来的问题。通过动态检索和整合技能，Agent可以更好地适应不同的任务需求。

技术框架：SRA框架包含以下主要模块：1) 技能库构建：构建包含大量技能的外部知识库。2) 技能检索：根据当前任务需求，从技能库中检索相关技能。3) 技能整合：将检索到的技能整合到LLM的上下文中，供LLM使用。4) 任务执行：LLM利用整合后的技能执行任务。SRA-Bench基准测试用于评估整个流程的性能。

关键创新：该论文的关键创新在于提出了技能检索增强（SRA）范式，将检索机制引入到Agent的技能利用过程中。与传统的上下文枚举方法相比，SRA可以更好地处理大规模技能库，并提高Agent的性能。此外，SRA-Bench基准测试的构建也为该领域的研究提供了有力的支持。

关键设计：SRA-Bench包含5,400个能力密集型测试实例和26,262个技能，其中包括636个手动构建的黄金技能和从网络收集的干扰技能。技能检索模块可以使用不同的检索算法，例如基于向量相似度的检索。技能整合模块需要将检索到的技能以适当的方式添加到LLM的上下文中，例如使用prompt engineering。任务执行模块使用LLM来执行任务，并评估其性能。

🖼️ 关键图片

📊 实验亮点

实验结果表明，基于检索的技能增强可以显著提高Agent的性能。具体而言，SRA在SRA-Bench基准测试上取得了显著的性能提升，验证了该范式的有效性。然而，实验也发现，当前的LLM Agent在技能整合方面存在不足，无论是否检索到黄金技能，Agent都倾向于以相似的速率加载技能，这表明技能整合是未来研究的重要方向。

🎯 应用场景

该研究成果可应用于各种需要Agent与外部知识交互的场景，例如智能客服、自动化编程、科学研究等。通过SRA，Agent可以更有效地利用外部知识，提高解决问题的能力。未来，SRA有望成为Agentic AI的重要组成部分，推动Agent在更广泛的领域得到应用。

📄 摘要（原文）

As large language models (LLMs) evolve into agentic problem solvers, they increasingly rely on external, reusable skills to handle tasks beyond their native parametric capabilities. In existing agent systems, the dominant strategy for incorporating skills is to explicitly enumerate available skills within the context window. However, this strategy fails to scale: as skill corpora expand, context budgets are consumed rapidly, and the agent becomes markedly less accurate in identifying the right skill. To this end, this paper formulates Skill Retrieval Augmentation (SRA), a new paradigm in which agents dynamically retrieve, incorporate, and apply relevant skills from large external skill corpora on demand. To make this problem measurable, we construct a large-scale skill corpus and introduce SRA-Bench, the first benchmark for decomposed evaluation of the full SRA pipeline, covering skill retrieval, skill incorporation, and end-task execution. SRA-Bench contains 5,400 capability-intensive test instances and 636 manually constructed gold skills, which are mixed with web-collected distractor skills to form a large-scale corpus of 26,262 skills. Extensive experiments show that retrieval-based skill augmentation can substantially improve agent performance, validating the promise of the paradigm. At the same time, we uncover a fundamental gap in skill incorporation: current LLM agents tend to load skills at similar rates, regardless of whether a gold skill is retrieved or whether the task actually requires external capabilities. This shows that the bottleneck in skill augmentation lies not only in retrieval but also in the base model's ability to determine which skill to load and when external loading is actually needed. These findings position SRA as a distinct research problem and establish a foundation for the scalable augmentation of capabilities in future agent systems.

Skill Retrieval Augmentation for Agentic AI

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理