SmartSearch: Process Reward-Guided Query Refinement for Search Agents

作者: Tongyu Wen, Guanting Dong, Zhicheng Dou

分类: cs.AI

发布日期: 2026-01-08

备注: 16 pages, 6 figures

🔗 代码/项目: GITHUB

💡 一句话要点

SmartSearch：提出过程奖励引导的查询优化框架，提升搜索Agent的知识检索能力

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 搜索Agent 查询优化 过程奖励 课程学习 知识检索 大型语言模型 信息检索

📋 核心要点

现有基于LLM的搜索Agent忽略了中间搜索查询的质量，导致检索结果不佳，限制了整体性能。
SmartSearch通过过程奖励细粒度地监督查询质量，并选择性地优化低质量查询，提升检索效果。
三阶段课程学习框架引导Agent逐步提升查询质量，实验表明SmartSearch显著优于现有方法。

📝 摘要（中文）

基于大型语言模型（LLM）的搜索Agent在解决知识密集型问题方面展现出巨大潜力，它们通过整合信息检索能力来完成任务。现有工作主要集中于优化搜索Agent的推理范式，而忽略了推理过程中中间搜索查询的质量。这导致生成的查询通常不准确，进而产生不理想的检索结果，最终限制了搜索Agent的整体有效性。为了解决这个问题，我们提出了SmartSearch框架，该框架基于两个关键机制：（1）过程奖励，通过双层信用评估为每个中间搜索查询的质量提供细粒度的监督。（2）查询优化，通过选择性地优化低质量的搜索查询，并基于这些优化结果重新生成后续的搜索轮次，从而促进查询生成的优化。为了使搜索Agent能够在过程奖励的指导下逐步内化提高查询质量的能力，我们设计了一个三阶段课程学习框架，引导Agent经历从模仿到对齐，最终到泛化的过程。实验结果表明，SmartSearch始终优于现有的基线方法，额外的定量分析进一步证实了其在搜索效率和查询质量方面的显著提升。

🔬 方法详解

问题定义：现有基于大型语言模型的搜索Agent在知识密集型任务中，中间查询质量不高，导致检索结果不准确，最终影响Agent的整体性能。现有方法主要关注推理范式的优化，忽略了查询质量的重要性。

核心思路：SmartSearch的核心思路是通过过程奖励来指导查询优化。具体来说，对每一个中间查询进行质量评估，并根据评估结果对低质量的查询进行优化，然后基于优化后的查询重新生成后续的搜索轮次。这样可以使Agent逐步学习到如何生成高质量的查询，从而提高检索效率和准确性。

技术框架：SmartSearch框架主要包含三个阶段：模仿学习阶段、对齐学习阶段和泛化学习阶段。在模仿学习阶段，Agent学习模仿高质量的查询生成过程。在对齐学习阶段，Agent学习将查询生成与过程奖励对齐，从而优化查询质量。在泛化学习阶段，Agent学习在新的任务中生成高质量的查询。框架包含双层信用评估模块，用于评估中间查询的质量，以及查询优化模块，用于优化低质量的查询。

关键创新：SmartSearch的关键创新在于引入了过程奖励机制，对中间查询的质量进行细粒度的监督。与现有方法只关注最终结果的奖励不同，过程奖励可以更有效地指导Agent学习生成高质量的查询。此外，SmartSearch还提出了一个三阶段课程学习框架，使Agent能够逐步学习到如何生成高质量的查询。

关键设计：双层信用评估模块包含两个层次的评估：第一层评估查询与最终答案的相关性，第二层评估查询的信息量。查询优化模块使用强化学习方法，根据过程奖励来优化查询生成策略。课程学习框架的三个阶段分别使用不同的损失函数，以引导Agent逐步学习到如何生成高质量的查询。

📊 实验亮点

实验结果表明，SmartSearch在多个知识密集型任务上均优于现有基线方法。例如，在HotpotQA数据集上，SmartSearch的准确率比最佳基线提高了5%以上。此外，定量分析表明，SmartSearch显著提高了搜索效率和查询质量，验证了过程奖励和查询优化机制的有效性。

🎯 应用场景

SmartSearch可应用于各种需要知识检索的场景，例如问答系统、智能助手、科学研究等。通过提高搜索Agent的查询质量和检索效率，可以显著提升这些应用的性能和用户体验。该研究的未来影响在于推动了基于LLM的搜索Agent在更广泛领域的应用，并为提高Agent的智能水平提供了新的思路。

📄 摘要（原文）

Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents' overall effectiveness. To mitigate this issue, we introduce SmartSearch, a framework built upon two key mechanisms: (1) Process rewards, which provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment. (2) Query refinement, which promotes the optimization of query generation by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements. To enable the search agent to progressively internalize the ability to improve query quality under the guidance of process rewards, we design a three-stage curriculum learning framework. This framework guides the agent through a progression from imitation, to alignment, and ultimately to generalization. Experimental results show that SmartSearch consistently surpasses existing baselines, and additional quantitative analyses further confirm its significant gains in both search efficiency and query quality. The code is available at https://github.com/MYVAE/SmartSearch.

SmartSearch: Process Reward-Guided Query Refinement for Search Agents

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册