Few-Step Diffusion via Score identity Distillation

作者: Mingyuan Zhou, Yi Gu, Zhendong Wang

分类: cs.CV, cs.LG, stat.ML

发布日期: 2025-05-19

🔗 代码/项目: GITHUB

💡 一句话要点

提出基于Score identity Distillation的SiD框架，加速Stable Diffusion XL等文图生成模型。

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱四：生成式动作 (Generative Motion)

关键词: 扩散模型 蒸馏 文本到图像生成 无数据学习 少步生成

📋 核心要点

现有扩散蒸馏方法依赖真实或合成图像，且分类器无关引导(CFG)导致文本对齐和多样性之间的权衡。
提出Score identity Distillation (SiD)框架，通过优化score匹配，实现无数据、少步的图像生成。
实验表明，SiD在SD1.5和SDXL上实现了最先进的性能，并对缺少真实图像具有鲁棒性。

📝 摘要（中文）

扩散蒸馏是一种加速文本到图像(T2I)扩散模型的有效策略，它将预训练的score网络提炼成单步或少数步生成器。现有方法在蒸馏高分辨率T2I扩散模型（如Stable Diffusion XL (SDXL)）时，通常依赖真实或教师模型合成的图像才能表现良好，并且分类器无关引导(CFG)的使用在文本-图像对齐和生成多样性之间存在持续的权衡。本文通过优化Score identity Distillation (SiD)（一种无数据的单步蒸馏框架）来解决这些挑战，以实现少数步生成。理论分析表明，将所有生成步骤的输出的均匀混合与数据分布相匹配是合理的。本文的少数步蒸馏算法避免了特定步骤的网络，并无缝集成到现有流程中，在1024x1024分辨率的SDXL上实现了最先进的性能。为了缓解真实文本-图像对可用时对齐-多样性的权衡，本文引入了应用于均匀混合的基于扩散GAN的对抗损失，并提出了两种新的引导策略：Zero-CFG（禁用教师模型中的CFG并移除伪score网络中的文本条件）和Anti-CFG（在伪score网络中应用负CFG）。这种灵活的设置提高了多样性，而不会牺牲对齐。在SD1.5和SDXL上的综合实验表明，在单步和少数步生成设置中都具有最先进的性能，并且对缺少真实图像具有鲁棒性。高效的PyTorch实现以及由此产生的单步和少数步蒸馏生成器将在https://github.com/mingyuanzhou/SiD-LSG 上以单独的分支公开发布。

🔬 方法详解

问题定义：现有扩散蒸馏方法在加速高分辨率文本到图像生成模型（如SDXL）时，存在对真实或教师模型合成图像的依赖，以及分类器无关引导(CFG)带来的文本-图像对齐和生成多样性之间的权衡问题。这些问题限制了蒸馏模型的效率和生成质量。

核心思路：本文的核心思路是通过优化Score identity Distillation (SiD)，实现一种无数据的单步或少数步蒸馏框架。SiD通过匹配所有生成步骤输出的均匀混合与数据分布，避免了对真实图像的依赖，并缓解了CFG带来的对齐-多样性权衡。

技术框架：SiD框架主要包含以下几个部分：1) 预训练的教师扩散模型；2) 用于蒸馏的学生模型（单步或少数步生成器）；3) Score identity Distillation损失函数，用于优化学生模型，使其生成的图像的score与教师模型生成的图像的score尽可能一致；4) 可选的Diffusion GAN对抗损失，用于进一步提高生成图像的质量和多样性；5) Zero-CFG或Anti-CFG引导策略，用于缓解CFG带来的对齐-多样性权衡。

关键创新：SiD的关键创新在于：1) 提出了一种无数据的蒸馏方法，避免了对真实或合成图像的依赖；2) 通过匹配所有生成步骤输出的均匀混合与数据分布，实现了高效的少数步生成；3) 提出了Zero-CFG和Anti-CFG引导策略，有效缓解了CFG带来的对齐-多样性权衡。

关键设计：SiD的关键设计包括：1) Score identity Distillation损失函数，用于衡量学生模型和教师模型生成图像的score之间的差异；2) Diffusion GAN对抗损失，用于提高生成图像的质量和多样性；3) Zero-CFG引导策略，通过禁用教师模型中的CFG并移除伪score网络中的文本条件，来提高生成图像的多样性；4) Anti-CFG引导策略，通过在伪score网络中应用负CFG，来提高生成图像的多样性。

🖼️ 关键图片

📊 实验亮点

SiD在SDXL 1024x1024分辨率上实现了最先进的性能。实验表明，SiD在单步和少数步生成设置中均优于现有方法，并且对缺少真实图像具有鲁棒性。此外，Zero-CFG和Anti-CFG引导策略有效缓解了对齐-多样性权衡，提高了生成图像的质量和多样性。

🎯 应用场景

该研究成果可应用于快速文本到图像生成、图像编辑、内容创作等领域。通过高效的蒸馏方法，可以降低扩散模型的计算成本，使其更容易部署在资源受限的设备上，并加速生成过程，提高用户体验。未来，该技术有望推动AI生成内容(AIGC)的广泛应用。

📄 摘要（原文）

Diffusion distillation has emerged as a promising strategy for accelerating text-to-image (T2I) diffusion models by distilling a pretrained score network into a one- or few-step generator. While existing methods have made notable progress, they often rely on real or teacher-synthesized images to perform well when distilling high-resolution T2I diffusion models such as Stable Diffusion XL (SDXL), and their use of classifier-free guidance (CFG) introduces a persistent trade-off between text-image alignment and generation diversity. We address these challenges by optimizing Score identity Distillation (SiD) -- a data-free, one-step distillation framework -- for few-step generation. Backed by theoretical analysis that justifies matching a uniform mixture of outputs from all generation steps to the data distribution, our few-step distillation algorithm avoids step-specific networks and integrates seamlessly into existing pipelines, achieving state-of-the-art performance on SDXL at 1024x1024 resolution. To mitigate the alignment-diversity trade-off when real text-image pairs are available, we introduce a Diffusion GAN-based adversarial loss applied to the uniform mixture and propose two new guidance strategies: Zero-CFG, which disables CFG in the teacher and removes text conditioning in the fake score network, and Anti-CFG, which applies negative CFG in the fake score network. This flexible setup improves diversity without sacrificing alignment. Comprehensive experiments on SD1.5 and SDXL demonstrate state-of-the-art performance in both one-step and few-step generation settings, along with robustness to the absence of real images. Our efficient PyTorch implementation, along with the resulting one- and few-step distilled generators, will be released publicly as a separate branch at https://github.com/mingyuanzhou/SiD-LSG.

Few-Step Diffusion via Score identity Distillation

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理