SimSD: Simple Speculative Decoding in Diffusion Language Models

作者: Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo, Jinya Jiang, Haoru Li, Chaojie Ren, Yiming Huang, Kaijie Zhu, Zhongkai Yu, Kun Zhou, Jingbo Shang

分类: cs.CL, cs.AI

发布日期: 2026-06-01

备注: 13 pages, 4 figures, code available at https://github.com/airevo2/SimSD-release

💡 一句话要点

提出SimSD，一种用于扩散语言模型的高效推理解码算法，显著提升生成速度。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 扩散语言模型 推测解码 推理加速 注意力掩码 并行解码

📋 核心要点

扩散语言模型推理速度快，但与token级推测解码不兼容，限制了其加速潜力。
SimSD通过引入参考token和设计的注意力掩码，为扩散模型提供时间有效的token级上下文，实现推测解码。
实验表明，SimSD在不损失生成质量的前提下，显著提升了扩散语言模型的解码吞吐量，最高达7.46倍。

📝 摘要（中文）

扩散大型语言模型(dLLMs)作为自回归(AR) LLMs的一种有前景的替代方案，通过并行或分块解码提供更快的推理速度。然而，它们的掩码语言建模公式与标准的token级别推测解码不兼容，而推测解码是AR模型最有效的加速技术之一。在AR解码中，因果掩码保留了时间上有效的token级别上下文，使目标模型能够在单个前向传递中验证多个草稿token。相比之下，dLLMs依赖于掩码token和双向注意力，导致有效上下文在去噪步骤中发生变化，从而阻止了直接的token级别推测验证。为了弥合这一差距，我们提出了一种简单而有效的扩散语言模型推测解码算法，名为SimSD，它主要采用即插即用的掩码策略，使dLLMs具备时间上有效的token级别上下文，用于推测解码。我们的方法显式地引入来自草稿模型预测的参考token，并设计一个注意力掩码来调节它们与当前步骤token的交互，从而允许dLLMs在单个前向传递中计算草稿token的有效logits。这恢复了AR模型中因果掩码提供的关键验证能力，同时保留了dLLMs的并行解码优势。所提出的方法是免训练的，并且可以灵活地与其他加速技术（如KV缓存和分块解码）集成。在四个基准测试上对SDAR系列dLLMs进行的实验表明，我们的方法实现了高达7.46倍的解码吞吐量，同时保持甚至提高了平均生成质量。

🔬 方法详解

问题定义：扩散语言模型（dLLMs）虽然具有并行解码的优势，但其掩码语言建模方式与自回归模型常用的token级别推测解码技术不兼容。现有的dLLMs无法像自回归模型那样，在单次前向传播中验证多个草稿token，导致推理效率受限。

核心思路：SimSD的核心在于为dLLMs引入时间上有效的token级别上下文，使其能够进行推测解码。具体来说，SimSD通过一种即插即用的掩码策略，显式地引入来自草稿模型预测的参考token，并设计特定的注意力掩码，从而使dLLMs能够计算草稿token的有效logits。

技术框架：SimSD的整体框架包括以下几个关键步骤：1) 使用一个较小的草稿模型预测多个草稿token；2) 将这些草稿token作为参考token显式地添加到dLLM的输入中；3) 设计一个注意力掩码，该掩码控制参考token与当前步骤token之间的交互，确保dLLM能够基于时间上有效的上下文计算草稿token的logits；4) dLLM对草稿token进行验证，接受正确的token并拒绝错误的token。

关键创新：SimSD最重要的创新在于其掩码策略，该策略允许dLLMs在不改变模型结构或训练方式的情况下，获得进行token级别推测解码所需的时间有效上下文。与现有方法相比，SimSD无需对dLLM进行任何额外的训练，即可实现显著的推理加速。

关键设计：SimSD的关键设计包括：1) 参考token的引入方式，确保它们能够有效地影响dLLM的预测；2) 注意力掩码的设计，该掩码需要精确地控制参考token与当前步骤token之间的交互，以避免信息泄露并保证logits的有效性；3) 草稿模型的选择，需要在速度和准确性之间进行权衡，以最大化整体的解码吞吐量。

🖼️ 关键图片

📊 实验亮点

SimSD在四个基准测试上对SDAR系列dLLMs进行了评估，实验结果表明，SimSD能够显著提高解码吞吐量，最高可达7.46倍，同时保持甚至提高了平均生成质量。这些结果表明，SimSD是一种高效且有效的扩散语言模型加速技术。

🎯 应用场景

SimSD可广泛应用于各种需要快速文本生成的场景，例如对话系统、机器翻译、文本摘要等。通过提高扩散语言模型的推理速度，SimSD能够降低计算成本，并提升用户体验。未来，SimSD还可以与其他加速技术相结合，进一步提升扩散语言模型的性能。

📄 摘要（原文）

Diffusion large language models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) LLMs, offering faster inference through parallel or blockwise decoding. However, their masked language modeling formulation remains incompatible with standard token-level speculative decoding, one of the most effective acceleration techniques for AR models. In AR decoding, the causal mask preserves temporally valid token-level contexts, enabling a target model to verify multiple drafted tokens in a single forward pass. In contrast, dLLMs rely on mask tokens and bidirectional attention, causing the effective context to change across denoising steps and preventing direct token-level speculative verification. To bridge this gap, we propose a simple but effective speculative decoding algorithm for diffusion language models, named SimSD, which mainly adopts a plug-and-play masking strategy that equips dLLMs with temporally valid token-level contexts for speculative decoding. Our method explicitly introduces reference tokens from draft-model predictions and designs an attention mask that regulates their interaction with current-step tokens, allowing dLLMs to compute valid logits for drafted tokens in a single forward pass. This restores the key verification ability provided by causal masking in AR models while preserving the parallel decoding advantages of dLLMs. The proposed method is training-free and can be flexibly integrated with other acceleration techniques such as KV cache and blockwise decoding. Experiments on SDAR-family dLLMs across four benchmarks show that our method achieves up to 7.46x higher decoding throughput while maintaining and even improving average generation quality.

SimSD: Simple Speculative Decoding in Diffusion Language Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理