LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification

作者: Xuan Li, Yining Wang, Yuchen Liu, Guanjun Liu, Delai Qiu, Shengping Liu, Jiaen Liang, Wei Huang, Jun Yu, Junnan Zhu

分类: cs.CL

发布日期: 2026-05-08

🔗 代码/项目: GITHUB

💡 一句话要点

提出LaTER推理框架：通过潜空间探索与显式验证实现高效测试时推理

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大语言模型 思维链推理 潜空间探索 推理效率优化 数学推理 模型压缩

📋 核心要点

现有CoT推理依赖逐个生成离散Token，导致推理过程冗长且计算成本高昂，限制了模型在复杂任务中的实时应用效率。
LaTER提出两阶段推理范式，先在连续潜空间进行高效探索，再切换至显式CoT进行逻辑验证，平衡了计算效率与推理准确性。
实验证明LaTER在无需训练和微调两种模式下均能显著降低Token消耗，并在AIME等数学竞赛基准上大幅超越标准CoT基线。

📝 摘要（中文）

思维链（CoT）推理提升了大语言模型在复杂任务上的表现，但因每个中间步骤均需生成离散Token，导致推理成本高昂。潜空间推理虽通过传播连续状态减少了可见Token生成，但往往因缺乏显式推导而削弱符号检查能力。为此，本文提出“潜空间后显式推理”（LaTER）范式，即先在连续潜空间进行有界探索，随后切换至显式CoT进行验证与答案生成。在无需训练的实现中，LaTER通过将末层隐藏状态映射回输入嵌入空间、保留潜空间KV缓存，并利用熵与停止Token探测器决定切换时机。实验表明，LaTER在Qwen3-14B上显著降低了Token消耗，同时保持或提升了推理准确率。此外，通过构建Latent-Switch-69K数据集进行微调，LaTER在AIME 2025基准测试中达到80.0%的准确率，较标准CoT提升10个百分点，且Token消耗减少33%。

🔬 方法详解

问题定义：现有大模型推理主要依赖显式思维链，导致推理路径过长、计算资源浪费严重。纯潜空间推理虽能压缩计算，但难以处理需要精确符号验证的复杂逻辑任务。

核心思路：LaTER采用“潜空间探索+显式验证”的混合策略。利用模型内部隐藏状态的结构化轨迹进行初步推理，在必要时切换至显式生成，从而在保持逻辑严密性的同时大幅减少Token生成量。

技术框架：该框架包含两个阶段：第一阶段为潜空间探索，通过投影末层隐藏状态至嵌入空间并复用KV缓存进行状态传播；第二阶段为显式推理，利用熵值监控与停止Token探测器触发切换，执行标准的CoT生成。

关键创新：引入了无需训练的潜空间探测机制，发现强推理模型天然具备结构化潜空间轨迹；构建了Latent-Switch-69K监督数据集，通过对齐压缩的推理直觉与显式推导，实现了推理过程的有效蒸馏。

关键设计：采用基于熵的动态切换策略，通过监控模型输出的不确定性来决定何时从潜空间跳出至显式生成，并结合KV缓存重用技术优化推理延迟。

🖼️ 关键图片

📊 实验亮点

在Qwen3-14B模型上，无需训练的LaTER在多个基准测试中减少了16%-32%的Token消耗，同时保持了准确率。在AIME 2025竞赛中，微调后的LaTER准确率从70.0%提升至80.0%，且Token消耗降低了33%，证明了该方法在复杂数学推理任务中显著优于标准CoT基线。

🎯 应用场景

该技术适用于对推理成本敏感且要求高准确性的场景，如自动化数学证明、复杂代码生成、科学计算辅助及长逻辑链决策系统。通过降低推理Token消耗，LaTER能显著提升大模型在边缘设备或高并发云端环境下的部署效率，推动复杂推理任务的规模化应用。

📄 摘要（原文）

Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propagating continuous states, yet replacing explicit derivations with latent computation can hurt tasks that require symbolic checking. We propose Latent-Then-Explicit Reasoning (LaTER), a two-stage paradigm that first performs bounded exploration in a continuous latent space and then switches to explicit CoT for verification and answer generation. In a training-free instantiation, LaTER projects final-layer hidden states back to the input embedding space, preserves the latent KV cache, and uses entropy and model-native stop-token probes to decide when to switch. We find that strong reasoning models already exhibit structured latent trajectories under this interface. On Qwen3-14B, training-free LaTER reduces total token usage by 16%-32% on several benchmarks while matching or improving accuracy on most of them; for example, it improves AIME 2025 from 70.0% to 73.3% while reducing tokens from 15,730 to 10,661. We further construct Latent-Switch-69K, a supervised corpus that pairs condensed solution intuitions with shortened explicit derivations. Fine-tuning with latent rollout and halting supervision yields additional gains: trained LaTER reaches 80.0% accuracy on AIME 2025, 10.0 points above the standard CoT baseline, while using 33% fewer tokens. Our code, data, and model are available at https://github.com/TioeAre/LaTER.

LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理