ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

作者: Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, Zhicheng Dou

分类: cs.IR, cs.AI, cs.CL, cs.LG

发布日期: 2025-08-09 (更新: 2025-08-22)

备注: 21 pages

🔗 代码/项目: GITHUB

💡 一句话要点

提出ReasonRank以解决复杂场景下的段落排名问题

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 段落排名 推理模型 强化学习 数据合成 信息检索

📋 核心要点

现有的推理密集型重排名方法在复杂场景中表现不佳，主要由于缺乏高质量的推理训练数据。
本文提出了一种自动化的训练数据合成框架，并结合冷启动微调和强化学习的两阶段后训练方法，以增强重排名能力。
ReasonRank在BRIGHT基准测试中取得了40.6的最新性能，相较于现有基线显著提升，并且延迟更低。

📝 摘要（中文）

基于大型语言模型的列表排名在许多段落排名任务中表现优异。随着大型推理模型的发展，逐步推理在测试阶段有助于提高列表排名性能。然而，由于推理密集型训练数据的稀缺，现有的重排名方法在复杂排名场景中表现不佳。本文提出了一种自动化的推理密集型训练数据合成框架，并设计了自一致性数据过滤机制，以确保数据质量。此外，提出了两阶段的后训练方法，结合冷启动监督微调和强化学习，显著提升了重排名能力。实验结果表明，ReasonRank在BRIGHT基准测试中达到了40.6的最新性能，且延迟显著低于点对点重排名方法Rank1。

🔬 方法详解

问题定义：本文旨在解决推理密集型重排名方法在复杂场景中表现不佳的问题，现有方法因缺乏高质量训练数据而受到限制。

核心思路：提出一种自动化的训练数据合成框架，通过多领域的查询和段落生成高质量标签，并结合两阶段的后训练方法，提升重排名模型的推理能力。

技术框架：整体架构包括数据合成模块、自一致性数据过滤机制、冷启动监督微调阶段和强化学习阶段，确保模型在推理能力和排名性能上的提升。

关键创新：最重要的创新在于设计了多视角排名奖励机制，相较于传统的基于排名指标的奖励，更有效地提升了模型的排名能力。

关键设计：在冷启动阶段采用监督微调，强化学习阶段则利用多视角奖励机制，确保模型在推理和排名能力上均得到增强。

📊 实验亮点

ReasonRank在BRIGHT基准测试中达到了40.6的最新性能，相较于现有基线显著提升，且在延迟方面表现优于点对点重排名方法Rank1，显示出其在实际应用中的高效性和可靠性。

🎯 应用场景

该研究的潜在应用领域包括信息检索、问答系统和推荐系统等，能够有效提升复杂查询场景下的段落排名质量。未来，ReasonRank有望在更广泛的自然语言处理任务中发挥重要作用，推动智能搜索和信息获取的进步。

📄 摘要（原文）

Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models, many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. However, due to the scarcity of reasoning-intensive training data, existing rerankers perform poorly in many complex ranking scenarios and the ranking ability of reasoning-intensive rerankers remains largely underdeveloped. In this paper, we first propose an automated reasoning-intensive training data synthesis framework, which sources training queries and passages from diverse domains and applies DeepSeek-R1 to generate high-quality training labels. A self-consistency data filtering mechanism is designed to ensure the data quality. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage post-training approach, which includes a cold-start supervised fine-tuning (SFT) stage for reasoning pattern learning and a reinforcement learning (RL) stage for further ranking ability enhancement. During the RL stage, based on the nature of listwise ranking, we design a multi-view ranking reward, which is more effective than a ranking metric-based reward. Extensive experiments demonstrate that our trained reasoning-intensive reranker \textbf{ReasonRank} outperforms existing baselines significantly and also achieves much lower latency than pointwise reranker Rank1. \textbf{Through further experiments, our ReasonRank has achieved state-of-the-art (SOTA) performance 40.6 on the BRIGHT leaderboard\footnote{https://brightbenchmark.github.io/}.} Our codes are available at https://github.com/8421BCD/ReasonRank.

ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册