Closed-Form Spectral Regularization for Multi-Task Model Merging

📄 arXiv: 2606.07289v1 📥 PDF

作者: Yongxian Wei, Runxi Cheng, Xingxuan Zhang, Li Shen, Chun Yuan, Peng Cui, Dacheng Tao

分类: cs.LG, cs.CV

发布日期: 2026-06-05


💡 一句话要点

提出闭式谱正则化以解决多任务模型合并问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 模型合并 谱正则化 多任务学习 线性逆问题 自适应算法

📋 核心要点

  1. 现有的模型合并方法在效率和性能上存在不足,尤其是在处理干扰和噪声时表现不佳。
  2. 本文提出了一种新的谱过滤估计器SWUDI,通过结合软指数滤波和硬top-K截断来优化模型合并过程。
  3. 实验结果表明,所提方法在多个基准测试中与现有最先进的合并方法相匹配或超越,同时显著减少了计算时间和内存消耗。

📝 摘要(中文)

模型合并将多个独立微调的专家模型合并为一个多任务模型,无需训练数据,从而降低大型基础模型的存储、服务和去中心化开发成本。现有的合并方法将其视为逐层的二次干扰最小化问题,尽管该问题有确切的闭式伪逆解,但在实践中表现不如数百次的梯度下降。本文重新审视这一过程,表明迭代求解器并非主要作为优化器,而是作为隐式谱正则化器,解决不适定的正规方程。基于此发现,本文将多任务模型合并形式化为一个带噪声的线性逆问题,并提出了一个由每个方向过滤器参数化的谱过滤估计器。通过SWUDI方法,结合软指数滤波器和硬top-K截断,进一步提出了自适应变体SWUDI-A,显著提高了在不同架构下的鲁棒性。

🔬 方法详解

问题定义:本文旨在解决多任务模型合并中的干扰最小化问题。现有方法虽然提供了闭式解,但在实际应用中效率低下,迭代过程的成本高且效果不佳。

核心思路:论文提出将模型合并视为一个带噪声的线性逆问题,利用谱正则化来抑制小特征值方向的噪声放大,从而提高合并效果。

技术框架:整体架构包括一个谱过滤估计器SWUDI,首先通过软指数滤波器处理梯度流轨迹,然后应用硬top-K截断来抑制噪声。SWUDI-A作为自适应变体,进一步优化了每层的超参数设置。

关键创新:最重要的技术创新在于将迭代求解器视为隐式谱正则化器,而非传统的优化器,从而有效地处理了不适定的正规方程。

关键设计:设计中采用了每层共享的对称特征分解,避免了训练数据和优化器状态的需求,确保了方法的高效性和适应性。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,所提出的谱求解器在四个通用基准和多模态合并基准上与最先进的合并方法相匹配或超越,计算时间减少28-72倍,峰值GPU内存减少高达50%。

🎯 应用场景

该研究的潜在应用领域包括自然语言处理、计算机视觉和多模态学习等,能够有效整合不同任务的模型,降低资源消耗,提升模型的部署效率。未来,该方法可能在大规模模型的开发和应用中发挥重要作用,推动模型合并技术的进一步发展。

📄 摘要(原文)

Model merging combines several independently fine-tuned experts into a single multi-task model without any training data, reducing the storage, serving, and decentralized-development costs of large foundation models. State-of-the-art merging methods formulate merging as a layer-wise quadratic interference minimization problem. Although this problem admits an exact closed-form pseudoinverse solution, that solution underperforms hundreds of iterations of gradient descent in practice. The iterative loop dominates the cost of the pipeline, yet its effectiveness has remained unexplained. We revisit this regime and show that the iterative solver does not primarily act as an optimizer; rather, it serves as an implicit spectral regularizer for an ill-posed normal equation, where small-eigenvalue directions of the per-layer interference operator amplify proxy noise. Building on this finding, we formalize multi-task model merging as a noisy linear inverse problem and propose a spectral filtering estimator parameterized by a per-direction filter. We instantiate this estimator with SWUDI, a closed-form method that combines a soft exponential filter, which matches the gradient-flow trajectory of iterative descent, with a hard top-K truncation that suppresses noise-amplifying small-eigenvalue directions. Furthermore, we propose SWUDI-A, an adaptive variant that replaces the global rank hyperparameter with per-layer rank rules, further improving robustness across architectures. Both variants share a single symmetric eigendecomposition per linear layer and require no training data or optimizer state. Across four general benchmarks and a multimodal merging benchmark spanning VQA, Geometry, Chart, OCR, Grounding, and modality merging, our proposed spectral solvers match or outperform state-of-the-art merging methods. Crucially, they reduce wall-clock time by 28-72x and peak GPU memory by up to 50%.