Optimal Transceiver Design in Over-the-Air Federated Distillation

作者: Zihao Hu, Jia Yan, Ying-Jun Angela Zhang, Jun Zhang, Khaled B. Letaief

分类: eess.SP, cs.AI

发布日期: 2025-07-21

备注: 13 pages, 7 figures, submitted to IEEE Transactions on Wireless Communications

💡 一句话要点

提出空口联邦蒸馏框架，优化收发器设计以提升联邦学习收敛速度。

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture)

关键词: 联邦学习 知识蒸馏 空口聚合 收发器设计 无线通信

📋 核心要点

现有联邦学习方法因大型AI模型产生大量通信开销而效率低下。
提出空口联邦蒸馏框架，仅共享模型输出而非参数，利用无线信道聚合。
通过优化收发器设计，最大化学习收敛速度，并在通信开销上显著降低。

📝 摘要（中文）

本文提出了一种新颖的空口联邦蒸馏(FD)框架，结合了联邦学习(FL)和知识蒸馏的优势，避免了繁重的本地模型传输。与共享模型参数不同，该方法仅共享无线设备(WD)的模型输出，即知识，并利用多址信道的叠加特性在空中进行聚合。本文研究了空口FD中的收发器设计，旨在最大化学习收敛速度，同时满足收发器的功率约束。主要的挑战在于学习性能分析的难处理性，以及跨越整个FD训练周期的非凸优化问题。为了解决这个问题，我们首先推导了空口FD中收敛速度的解析表达式。然后，在给定接收机合并策略的情况下，获得了WD发射功率和空口聚合估计器的闭式最优解。据此，我们提出了一种有效的半定松弛方法来寻找最优接收机波束成形向量。我们进一步证明了接收机波束成形设计的原始问题和松弛问题之间没有最优性差距。数值结果表明，与传统的FL基准相比，所提出的空口FD方法在测试精度上只有很小的妥协，但在通信开销上实现了显著降低。

🔬 方法详解

问题定义：现有联邦学习方法在处理大型AI模型时，需要传输大量的模型参数，导致通信开销巨大，成为效率瓶颈。尤其是在无线设备参与的联邦学习场景中，带宽受限更加剧了这一问题。因此，如何降低通信开销，提升联邦学习的效率是本文要解决的核心问题。

核心思路：本文的核心思路是利用知识蒸馏的思想，将每个无线设备本地训练的模型输出（即知识）而非模型参数进行共享。同时，利用无线多址信道的叠加特性，在空中直接聚合这些知识，从而避免了大量的模型参数传输。通过优化收发器设计，进一步提升聚合的效率和准确性。

技术框架：该空口联邦蒸馏框架主要包含以下几个阶段：1) 本地训练：每个无线设备使用本地数据训练模型；2) 知识提取：每个设备提取本地模型的输出作为知识；3) 空口聚合：所有设备将知识通过无线信道发送到服务器，利用多址信道的叠加特性进行聚合；4) 模型更新：服务器利用聚合后的知识更新全局模型；5) 模型分发：服务器将更新后的全局模型分发给各个无线设备。

关键创新：本文最重要的技术创新点在于将知识蒸馏与空口聚合相结合，显著降低了通信开销。与传统的联邦学习方法相比，该方法无需传输模型参数，只需传输模型输出，大大减少了数据传输量。此外，通过优化收发器设计，进一步提升了空口聚合的效率和准确性。

关键设计：在收发器设计方面，本文推导了空口FD中收敛速度的解析表达式，并在此基础上，获得了WD发射功率和空口聚合估计器的闭式最优解。此外，本文还提出了一种基于半定松弛的算法来寻找最优接收机波束成形向量，并证明了原始问题和松弛问题之间没有最优性差距。这些设计保证了在满足功率约束的前提下，最大化学习收敛速度。

🖼️ 关键图片

📊 实验亮点

数值结果表明，与传统的联邦学习基准相比，所提出的空口FD方法在测试精度上只有很小的妥协，但在通信开销上实现了显著降低。具体而言，该方法能够在保证模型性能的前提下，大幅减少无线设备的传输数据量，从而提升联邦学习的效率。

🎯 应用场景

该研究成果可应用于无线传感器网络、物联网设备、边缘计算等场景，尤其适用于计算资源有限、通信带宽受限的设备。通过降低通信开销，提升联邦学习效率，可以加速AI模型在这些场景中的部署和应用，例如智能家居、智慧城市、工业自动化等。

📄 摘要（原文）

The rapid proliferation and growth of artificial intelligence (AI) has led to the development of federated learning (FL). FL allows wireless devices (WDs) to cooperatively learn by sharing only local model parameters, without needing to share the entire dataset. However, the emergence of large AI models has made existing FL approaches inefficient, due to the significant communication overhead required. In this paper, we propose a novel over-the-air federated distillation (FD) framework by synergizing the strength of FL and knowledge distillation to avoid the heavy local model transmission. Instead of sharing the model parameters, only the WDs' model outputs, referred to as knowledge, are shared and aggregated over-the-air by exploiting the superposition property of the multiple-access channel. We shall study the transceiver design in over-the-air FD, aiming to maximize the learning convergence rate while meeting the power constraints of the transceivers. The main challenge lies in the intractability of the learning performance analysis, as well as the non-convex nature and the optimization spanning the whole FD training period. To tackle this problem, we first derive an analytical expression of the convergence rate in over-the-air FD. Then, the closed-form optimal solutions of the WDs' transmit power and the estimator for over-the-air aggregation are obtained given the receiver combining strategy. Accordingly, we put forth an efficient approach to find the optimal receiver beamforming vector via semidefinite relaxation. We further prove that there is no optimality gap between the original and relaxed problem for the receiver beamforming design. Numerical results will show that the proposed over-the-air FD approach achieves a significant reduction in communication overhead, with only a minor compromise in testing accuracy compared to conventional FL benchmarks.

Optimal Transceiver Design in Over-the-Air Federated Distillation

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理