Closed-Form Linear-Probe Dataset Distillation for Pre-trained Vision Models

📄 arXiv: 2605.07194v1 📥 PDF

作者: Bincheng Peng, Guang Li, Ping Liu, Takahiro Ogawa, Miki Haseyama

分类: cs.CV, cs.AI, cs.LG

发布日期: 2026-05-08


💡 一句话要点

提出CLP-DD方法,通过闭式解实现预训练视觉模型的高效数据集蒸馏

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture)

关键词: 数据集蒸馏 预训练模型 线性探测 核岭回归 视觉迁移学习 模型压缩

📋 核心要点

  1. 现有数据集蒸馏方法在预训练模型场景下,要么依赖高昂的轨迹梯度匹配,要么基于不准确的NTK近似,缺乏针对线性探测的直接优化方案。
  2. 提出CLP-DD框架,利用核岭回归求解器直接计算合成集诱导的线性分类器,将合成图像更新转化为特征空间中类锚点的优化问题。
  3. 在ImageNet数据集上,该方法在保持高性能的同时,显著降低了计算开销与显存占用,实现了蒸馏效率的质的飞跃。

📝 摘要(中文)

数据集蒸馏旨在将大规模训练集压缩为保留下游任务效用的微型合成集。现有方法多针对从头训练网络,或依赖于基于轨迹的梯度匹配及神经正切核(NTK)近似,计算开销巨大。本文提出闭式线性探测数据集蒸馏(CLP-DD),针对冻结预训练特征的线性探测场景,利用核岭回归求解器直接计算合成集诱导的线性分类器,无需内循环迭代。通过引入基于温度缩放Softmax交叉熵的判别式外层损失,合成图像被更新为特征空间中的类锚点。实验表明,CLP-DD在ImageNet-100和ImageNet-1K上表现优异,在保持与现有先进方法(如LGM)相当性能的同时,计算速度提升约14倍,显存占用降低至八分之一以下。

🔬 方法详解

问题定义:论文旨在解决预训练视觉模型在冻结特征下的数据集蒸馏问题。现有方法未能利用线性探测存在闭式解的数学特性,导致在处理大规模数据集时计算复杂度过高且效率低下。

核心思路:利用线性探测在冻结特征下的闭式解特性,通过核岭回归直接求解合成集诱导的分类器。将合成图像的优化视为在特征空间中寻找最优类锚点的过程,从而避开复杂的内循环梯度更新。

技术框架:CLP-DD采用双层优化架构。内层利用核岭回归求解器计算合成集对应的线性分类器;外层通过评估该分类器在真实特征上的表现,利用判别式损失函数(温度缩放Softmax交叉熵)反向传播更新合成图像。

关键创新:摒弃了传统的轨迹匹配和NTK近似,直接利用闭式解进行优化。证明了判别式外层损失对于闭式求解器至关重要,有效弥补了与基于轨迹方法之间的性能差距。

关键设计:引入温度缩放的Softmax交叉熵作为外层损失,将分类器权重视为可学习的类锚点,通过优化合成图像使其在特征空间中更具判别力,从而实现高效的知识压缩。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

CLP-DD在ImageNet-100上显著优于不带DSA的LGM,并逼近带DSA的LGM性能。在ImageNet-1K上,该方法在四个预训练骨干网络中的三个上达到或超过了LGM+DSA的水平,同时运行速度提升约14倍,显存占用减少至原来的八分之一以下,展现了极高的计算效率。

🎯 应用场景

该技术适用于资源受限的边缘计算设备、大规模数据集的快速原型设计以及联邦学习中的数据隐私保护。通过将海量数据压缩为极小规模的合成集,可显著降低预训练模型微调的算力门槛,加速视觉任务的部署与迭代。

📄 摘要(原文)

Dataset distillation compresses a large training set into a small synthetic set that preserves downstream training utility. While most existing methods target training networks from scratch, modern visual transfer learning often uses frozen pre-trained encoders followed by lightweight linear probing. Existing distillation methods for this setting either unroll iterative linear-probe updates with trajectory-based gradient matching, or rely on closed-form formulations originally designed for from-scratch training with neural-tangent-kernel (NTK) approximations. Neither route exploits the fact that frozen-feature linear probing admits a closed-form solution determined directly by the pre-trained features themselves, with no infinite-width approximation and no inner-loop trajectory. We propose Closed-Form Linear-Probe Dataset Distillation (CLP-DD), a bilevel formulation that computes the linear probe induced by the synthetic set with a sample-space kernel ridge solver. The synthetic images are then updated by evaluating this induced classifier on real features through a temperature-scaled softmax cross-entropy, where the classifier columns act as learned class anchors in feature space. We further show that the choice of outer objective is decisive: pairing the closed-form inner solver with a standard MSE outer loss substantially underperforms trajectory-based methods, while the discriminative outer loss closes most of the gap. On ImageNet-100 with four pre-trained backbones, CLP-DD substantially improves over LGM without DSA and approaches LGM with DSA at a fraction of the computational cost. On ImageNet-1K, CLP-DD matches or surpasses LGM with DSA on three of four backbones while running roughly $14\times$ faster and using less than one-eighth of the GPU memory.