On the Blessing of Pre-training in Weak-to-Strong Generalization

作者: Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu, Ziqiao Wang, Yong Liu, Yunbei Xu

分类: cs.LG

发布日期: 2026-05-07

备注: 40 pages, 14 figures

💡 一句话要点

揭示弱监督向强模型泛化（W2SG）的本质：预训练作为几何暖启动的关键作用

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 弱到强泛化 预训练 单指标模型 大语言模型 优化理论 几何暖启动 相变分析

📋 核心要点

现有研究对弱到强泛化（W2SG）中预训练的决定性作用缺乏理论支撑，且随机初始化下模型往往难以实现超越弱监督者的泛化。
本文将预训练建模为谱初始化，通过高维单指标模型证明了预训练提供的几何暖启动是进入“有效区域”并实现泛化的必要条件。
实验证实W2SG并非模型固有属性，而是随预训练进程演化出的相变现象，并揭示了性能提升与弱监督者偏差之间的动态平衡。

📝 摘要（中文）

弱到强泛化（W2SG）范式认为，预训练的强模型能够超越其弱监督者。然而，预训练在其中的决定性作用尚缺乏深入的理论与实证研究。本文指出预训练是实现W2SG的必要前提。在理论上，我们利用高维单指标模型框架和尖峰高斯数据，将预训练形式化为谱初始化步骤。基于随机初始化下学习失败的不可能性结果，我们证明了当预训练提供几何“暖启动”，将模型置于由扰动强凸性刻画的“有效区域”内时，W2SG是可实现的。在此区域内，我们推导了严格的泛化界，揭示了优化动态：即性能先提升，随后受限于弱监督者的偏差而进入饱和瓶颈。实证方面，我们通过受控合成模拟验证了理论假设，并对大规模语言模型的数百个中间预训练检查点进行了评估，证明W2SG并非模型固有能力，而是随预训练进程呈现出紧密耦合的相变现象。

🔬 方法详解

问题定义：论文旨在解决弱到强泛化（W2SG）中“为什么强模型能超越弱监督者”以及“预训练如何影响这一过程”的理论黑盒问题。现有方法在随机初始化下往往无法有效利用弱监督信号，导致泛化性能受限。

核心思路：将预训练视为一种谱初始化手段，通过几何视角分析模型参数空间。核心思想是预训练将模型参数引导至一个“有效区域”，在该区域内，损失函数呈现出扰动强凸性，从而使得模型能够通过弱监督信号进一步优化并超越监督者。

技术框架：研究采用高维单指标模型（Single-Index Model）框架，利用尖峰高斯数据（Spiked Gaussian Data）模拟数据分布。框架包含预训练阶段（提供初始几何位置）和微调阶段（基于弱监督信号的优化），通过分析优化轨迹来刻画泛化能力的演变。

关键创新：首次从理论上证明了W2SG的实现依赖于预训练提供的几何暖启动；推导了刻画优化动态的严格泛化界，明确了性能提升与弱监督者偏差导致的饱和瓶颈之间的数学关系。

关键设计：引入“有效区域”概念，通过扰动强凸性几何刻画模型的可学习性；利用大规模语言模型不同阶段的检查点进行实证，通过相变分析验证了预训练程度与W2SG能力之间的紧密耦合关系。

🖼️ 关键图片

📊 实验亮点

通过受控合成实验验证了理论推导的准确性。在大规模语言模型实验中，通过分析数百个中间检查点，明确展示了W2SG能力随预训练进程呈现的“相变”特征，即模型在达到特定预训练阈值后，泛化能力出现显著跃升，随后受限于弱监督者的偏差进入性能饱和期。

🎯 应用场景

该研究为大模型对齐（Alignment）与弱监督学习提供了理论基石。其成果可指导如何选择合适的预训练检查点进行微调，优化模型在缺乏高质量标注数据时的性能表现，对提升大模型在复杂任务中的自主进化能力具有重要工程指导价值。

📄 摘要（原文）

The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work, we identify pre-training as the essential prerequisite for the emergence of W2SG. Theoretically, we formalize the W2SG problem within a high-dimensional single-index model framework using spiked Gaussian data, modeling pre-training as a spectral initialization step. Building upon prior impossibility results regarding the failure of learning under random initialization, we prove that W2SG is achievable when pre-training provides a geometric warm start that places the model within an "effective region" characterized by a perturbed strong-convexity geometry. Within this region, we derive a rigorous generalization bound that naturally captures the optimization dynamics: an initial performance improvement followed by a saturation bottleneck dictated by the weak supervisor's bias. Empirically, we first validate all our assumptions and theoretical insights through controlled synthetic simulations. Finally, through a massive-scale evaluation of hundreds of intermediate pre-training checkpoints from large language models, we demonstrate that W2SG is not an innate capability but emerges via a phase transition tightly coupled with the progression of pre-training.

On the Blessing of Pre-training in Weak-to-Strong Generalization

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理