Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration
作者: Shuzhang Zhong, Haochen Huang, Shengxuan Qiu, Pengfei Zuo, Runsheng Wang, Meng Li
分类: cs.LG
发布日期: 2026-05-11
备注: OSDI 2026
💡 一句话要点
提出SPEX以打破奖励瓶颈加速树状思维推理
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 树状思维推理 大型语言模型 奖励依赖 推理效率 投机路径选择 动态资源分配 自适应剪枝
📋 核心要点
- 现有的树状思维推理方法效率受到奖励依赖瓶颈的限制,导致搜索并行性不足和显著延迟。
- 本文提出的解决方案SPEX通过投机性路径选择和动态预算分配,实现了对推理过程的加速,突破了现有方法的瓶颈。
- 大量实验结果表明,SPEX在不同ToT算法上实现了1.2到3倍的速度提升,与token级投机解码结合使用时可获得高达4.1倍的加速。
📝 摘要(中文)
树状思维(ToT)推理将大型语言模型(LLM)的推理结构化为基于树的搜索,展现了解决复杂数学和编程任务的强大潜力。然而,其效率受到奖励依赖瓶颈的限制。为了解决这一问题,本文提出SPEX,结合三项关键技术:内部查询的投机路径选择、跨查询预算分配,以及自适应早期终止。我们的实验表明,SPEX在多种ToT推理算法中实现了1.2到3倍的加速,与token级投机解码协同作用下累计加速高达4.1倍,从而显著提升了ToT推理的效率。
🔬 方法详解
问题定义:本文旨在解决现有树状思维推理面临的效率瓶颈,尤其是由于奖励依赖导致的同步障碍,影响了推理的并行性和实时性。
核心思路:通过投机性探索路径,SPEX旨在打破奖励同步障碍,允许推理过程中的多个路径同时进行评估,从而提高整体效率。
技术框架:SPEX的框架包含三个核心模块:内部查询投机路径选择,负责识别并扩展高潜力分支;跨查询预算分配,动态平衡不同查询的资源使用;自适应早期终止,快速剪枝冗余的搜索路径。
关键创新:SPEX的主要创新在于引入投机性的路径选择和动态预算分配策略,这使得推理过程能够在保证准确性的前提下显著提高搜索效率。
关键设计:在技术细节上,SPEX使用了基于奖励预测的路径选择算法,结合上下文动态调整资源分配,确保高效的推理过程,并通过自适应机制快速剪枝低效路径。
🖼️ 关键图片
📊 实验亮点
SPEX在多种树状思维推理算法上实现了1.2到3倍的速度提升,特别是在与token级投机解码结合使用时,累计加速高达4.1倍,显示出显著的性能改善。这些结果通过大量实验和对比基线得以验证,证实了SPEX在提高推理效率方面的有效性。
🎯 应用场景
该研究的潜在应用领域包括复杂决策支持系统、实时编程助手和高级数学问题求解等。通过提高推理效率,SPEX可为大型语言模型提供更好的实时交互能力,拓展其在智能辅助和自动化领域的应用前景。
📄 摘要(原文)
Tree-of-Thought (ToT) reasoning structures Large Language Model (LLM) inference as a tree-based search, demonstrating strong potential for solving complex mathematical and programming tasks. However, its efficiency is constrained by the reward dependency barrier -- a synchronization bottleneck caused by sequential reward-guided exploration that limits search parallelism and introduces substantial latency. Prior system optimizations, mainly designed for linear Chain-of-Thought (CoT) reasoning, cannot address these challenges, leaving the efficiency of ToT underexplored. To enhance ToT reasoning efficiency, we observe that the reasoning paths can be explored speculatively to break the reward synchronization barrier. Therefore, in this paper, we propose SPEX and introduce three key techniques: (i) intra-query speculative path selection to predict and expand high-potential branches of ToT, (ii) inter-query budget allocation to balance speculative resource allocation across queries dynamically, and (iii) adaptive early termination to prune deep and redundant branches for a skewed search tree. We implement SPEX on top of the SGLang framework and evaluate it across diverse ToT algorithms and LLMs. Extensive experiments show that SPEX achieves $1.2 \sim 3 \times$ speedup for different ToT reasoning algorithms. Moreover, SPEX synergizes with token-level speculative decoding, achieving cumulative speedups of up to $4.1\times$. Ablation studies further confirm the contributions of each technique. Overall, SPEX represents a significant step toward efficient and scalable ToT reasoning, unlocking the parallelism required for high-performance inference-time scaling for LLMs.