Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets

作者: Prasanjit Dubey, Xiaoming Huo

分类: stat.ML, cs.IT, cs.LG

发布日期: 2026-05-28

💡 一句话要点

提出异构带宽预算下的联邦探测-逻辑蒸馏优化分配方法

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture)

关键词: 联邦学习 带宽优化 蒸馏训练 异构网络 条件分布估计

📋 核心要点

现有的联邦学习方法在带宽限制下难以有效估计条件分布，尤其是在节点带宽不均的情况下。
论文提出了一种异构带宽下的最优分配策略，结合了新的带宽界限和蒸馏过程的改进。
实验结果表明，优化的带宽分配策略在性能上优于均匀和逆加权基线，验证了理论界限的有效性。

📝 摘要（中文）

在联邦语言建模中，$K$个节点各自持有$n$个样本，但无法共享数据或交换全精度梯度。本文研究在每个节点最多上传$B$比特的条件下，如何以最小的速率估计条件分布。我们提出了一种新的异构带宽上界和最优分配策略，解决了现有方法在带宽不均匀情况下的不足。通过合成n-gram模拟，验证了理论界限的有效性，并展示了优化分配在性能上的优势。

🔬 方法详解

问题定义：本文旨在解决在联邦学习中，节点带宽不均匀时如何有效估计条件分布的问题。现有方法未能充分考虑带宽差异对性能的影响，导致估计速率不理想。

核心思路：我们提出了一种异构带宽的上界和最优分配策略，利用反向水填充原理优化每个节点的带宽分配，从而提高整体性能。

技术框架：整体框架包括节点上传量的优化、聚合器的全局参数蒸馏，以及通过多轮精炼来提升估计精度。主要模块包括带宽分配模块、量化模块和聚合模块。

关键创新：本研究的关键创新在于提出了针对异构带宽的最优分配策略，解决了现有方法在带宽不均时的性能瓶颈，且提供了闭式解。

关键设计：在设计中，我们引入了量化向量的传输机制，并使用了基于带宽的优化分配规则，确保每个节点的上传量与其带宽能力相匹配。

🖼️ 关键图片

📊 实验亮点

实验结果显示，优化的带宽分配策略在合成n-gram模拟中，KL散度的实际值被理论上下界有效包围，且在异构剪辑条件下，优化分配策略的性能明显优于均匀和逆加权基线，验证了理论的有效性。

🎯 应用场景

该研究的潜在应用领域包括分布式机器学习、边缘计算和智能设备的协同学习。通过优化带宽分配，可以在资源受限的环境中实现更高效的模型训练，提升联邦学习的实际应用价值。

📄 摘要（原文）

In federated language modeling, $K$ nodes each hold $n$ samples but cannot pool data or exchange full-precision gradients or weights. We study the minimax rate at which a conditional distribution over $V$ tokens can be estimated when each node may upload at most $B$ bits per query in a public probe set. In federated probe-logit distillation (FPLD), each node transmits a scalar-quantized logit vector on the probe set, and an aggregator distills a global parametric student. Prior work (Dubey and Huo, 2026) establishes a high-probability KL rate $O(d/(Kn) + ρ\sqrt{V \log V / m} + K^{-1} \cdot 2^{-2B/V})$ plus optimization slack, with the bandwidth term in its trace-sharpened form. Whether this bandwidth-term rate is tight, and how the upper bound generalizes to heterogeneous per-node bandwidths, are left open. We close both gaps. First, the dithered FPLD construction has a matching single-round lower bound $Ω(K^{-1} \cdot 2^{-2B/V})$ under non-degeneracy, pinning the bandwidth-axis rate at $Θ(K^{-1} \cdot 2^{-2B/V})$. $T$-round sequential refinement with nested/scaled residual quantizers achieves $O(K^{-1} \cdot 2^{-2TB/V})$; vanilla FPLD's $T$-independent bandwidth term is suboptimal for every $T > 1$. Second, we establish a heterogeneous-bandwidth upper bound for per-node budgets $B_i$, paired with a closed-form optimal allocation $B_i^* = B_{\mathrm{tot}}/K + (V/2) \log_2(w_i / \bar{w}_g)$, a log-tilted water-filling rule that is the per-node analogue of reverse water-filling for distortion-rate optimization. A plug-in adaptive variant estimates the weights from a short warm-up phase and attains $1 + O(\sqrt{\log(K/δ)/(m T_0)})$ relative suboptimality. Synthetic n-gram simulations confirm that empirical KL is bracketed by the upper and lower bounds and that the optimal allocation strictly dominates uniform and inverse-weighted baselines under heterogeneous clipping.

Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理