HaS: Accelerating RAG through Homology-Aware Speculative Retrieval
作者: Peng Peng, Weiwei Lin, Wentai Wu, Xinyang Wang, Yongheng Liu
分类: cs.IR, cs.CL
发布日期: 2026-04-22
备注: Accepted by ICDE 2026
🔗 代码/项目: GITHUB
💡 一句话要点
提出HaS框架以加速RAG检索过程
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 检索增强生成 同源关系 推测检索 知识检索 效率提升 多跳查询 大型语言模型
📋 核心要点
- 现有的RAG方法在知识数据库扩展时,检索过程变得耗时且效率低下。
- HaS框架通过同源关系进行推测检索,快速获取候选文档并验证其有效性。
- 实验结果显示HaS在多个数据集上减少检索延迟23.74%至36.99%,且准确性仅下降1-2%。
📝 摘要(中文)
检索增强生成(RAG)通过检索外部文档来扩展大型语言模型(LLMs)的知识边界。然而,随着知识数据库的增长,检索过程变得越来越耗时。现有的加速策略要么通过近似检索妥协准确性,要么通过重用严格相同查询的结果获得边际收益。本文提出了HaS,一个基于同源关系的推测检索框架,通过在限制范围内进行低延迟的推测检索来获取候选文档,并验证其是否包含所需知识。该验证基于查询之间的同源关系,形成同源查询重新识别任务,从而在识别到同源查询时允许系统跳过慢速的全数据库检索。HaS在实际应用中显著提高了效率,实验表明其在多个数据集上减少了23.74%至36.99%的检索延迟,且仅有1-2%的准确性下降。
🔬 方法详解
问题定义:本文旨在解决现有RAG方法在知识数据库扩展时检索效率低下的问题。现有方法往往在准确性和速度之间进行妥协,导致检索过程变得耗时。
核心思路:HaS框架的核心思路是利用同源关系进行推测检索,快速获取候选文档,并通过验证机制判断其有效性,从而提高检索效率。
技术框架:HaS的整体架构包括两个主要阶段:第一阶段是低延迟的推测检索,第二阶段是基于同源关系的验证。推测检索在限制范围内进行,以获取候选文档,随后通过同源查询重新识别任务验证其有效性。
关键创新:HaS的主要创新在于引入同源关系进行推测检索和验证,这一设计使得系统能够在识别到同源查询时跳过全数据库检索,显著提高了效率。
关键设计:在设计中,HaS采用了特定的参数设置和损失函数,以优化同源查询的识别过程,确保在快速检索的同时保持较高的准确性。
🖼️ 关键图片
📊 实验亮点
实验结果表明,HaS在多个数据集上实现了23.74%至36.99%的检索延迟减少,且仅有1-2%的准确性下降。这一性能提升在复杂的多跳查询中尤为显著,展示了HaS作为即插即用解决方案的有效性。
🎯 应用场景
HaS框架在多个领域具有广泛的应用潜力,尤其是在需要快速响应的智能问答系统和信息检索任务中。其高效的检索机制能够显著提升用户体验,适用于实时数据处理和大规模知识库的场景。未来,HaS可能会在更多复杂的多跳查询和动态知识更新环境中发挥重要作用。
📄 摘要(原文)
Retrieval-Augmented Generation (RAG) expands the knowledge boundary of large language models (LLMs) at inference by retrieving external documents as context. However, retrieval becomes increasingly time-consuming as the knowledge databases grow in size. Existing acceleration strategies either compromise accuracy through approximate retrieval, or achieve marginal gains by reusing results of strictly identical queries. We propose HaS, a homology-aware speculative retrieval framework that performs low-latency speculative retrieval over restricted scopes to obtain candidate documents, followed by validating whether they contain the required knowledge. The validation, grounded in the homology relation between queries, is formulated as a homologous query re-identification task: once a previously observed query is identified as a homologous re-encounter of the incoming query, the draft is deemed acceptable, allowing the system to bypass slow full-database retrieval. Benefiting from the prevalence of homologous queries under real-world popularity patterns, HaS achieves substantial efficiency gains. Extensive experiments demonstrate that HaS reduces retrieval latency by 23.74% and 36.99% across datasets with only a 1-2% marginal accuracy drop. As a plug-and-play solution, HaS also significantly accelerates complex multi-hop queries in modern agentic RAG pipelines. Source code is available at: https://github.com/ErrEqualsNil/HaS.