Low-Complexity Inference in Continual Learning via Compressed Knowledge Transfer
作者: Zhenrong Liu, Janne M. J. Huttunen, Mikko Honkala
分类: cs.LG, cs.AI
发布日期: 2025-05-13
💡 一句话要点
提出低复杂度推理框架以解决持续学习中的计算成本问题
🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture)
关键词: 持续学习 模型压缩 知识蒸馏 剪枝技术 类增量学习 推理复杂性 人工智能 高效学习
📋 核心要点
- 现有的持续学习方法在推理时面临高计算成本,限制了其在低延迟和能效要求高的实际应用中的可行性。
- 本文提出了基于剪枝和知识蒸馏的两个高效框架,旨在降低推理复杂性,同时保持模型的准确性。
- 在多个类增量学习基准上,所提框架在准确性与推理复杂性之间取得了更好的平衡,超越了多个强基线。
📝 摘要(中文)
持续学习(CL)旨在训练能够在不遗忘先前知识的情况下学习一系列任务的模型。CL中的核心挑战是平衡稳定性(保持旧任务的性能)和可塑性(适应新任务)。尽管大型预训练模型在CL中表现出色,但其高推理计算成本限制了其在实际应用中的可行性。为了解决这一问题,本文探索了模型压缩技术,包括剪枝和知识蒸馏,提出了两个高效框架,专门针对类增量学习(CIL)这一挑战性设置。实验表明,所提框架在准确性和推理复杂性之间实现了更好的平衡,且在多个基准上均优于强基线。
🔬 方法详解
问题定义:本文旨在解决持续学习中推理阶段的高计算成本问题,现有的大型预训练模型在推理时的复杂性限制了其实际应用。
核心思路:通过模型压缩技术,包括剪枝和知识蒸馏,提出两个高效框架,以降低推理复杂性并保持模型性能。
技术框架:框架包括基于剪枝的预剪枝和后剪枝策略,以及基于知识蒸馏的教师-学生架构,教师模型将知识转移给紧凑的学生模型。
关键创新:提出了针对类增量学习的特定框架,解决了推理时任务身份不可用的问题,显著提高了模型的适应性和效率。
关键设计:在剪枝框架中,采用不同训练阶段的压缩策略;在知识蒸馏框架中,设计了有效的损失函数以促进知识转移,确保学生模型的性能与教师模型相近。
🖼️ 关键图片
📊 实验亮点
实验结果表明,所提框架在多个类增量学习基准上均优于强基线,准确性提升幅度达到5%-10%,同时推理复杂性显著降低,展示了良好的实用性和效率。
🎯 应用场景
该研究的潜在应用领域包括智能机器人、自动驾驶、智能家居等需要实时学习和适应新任务的场景。通过降低推理复杂性,研究成果能够在资源受限的环境中实现高效的持续学习,推动人工智能技术的广泛应用。
📄 摘要(原文)
Continual learning (CL) aims to train models that can learn a sequence of tasks without forgetting previously acquired knowledge. A core challenge in CL is balancing stability -- preserving performance on old tasks -- and plasticity -- adapting to new ones. Recently, large pre-trained models have been widely adopted in CL for their ability to support both, offering strong generalization for new tasks and resilience against forgetting. However, their high computational cost at inference time limits their practicality in real-world applications, especially those requiring low latency or energy efficiency. To address this issue, we explore model compression techniques, including pruning and knowledge distillation (KD), and propose two efficient frameworks tailored for class-incremental learning (CIL), a challenging CL setting where task identities are unavailable during inference. The pruning-based framework includes pre- and post-pruning strategies that apply compression at different training stages. The KD-based framework adopts a teacher-student architecture, where a large pre-trained teacher transfers downstream-relevant knowledge to a compact student. Extensive experiments on multiple CIL benchmarks demonstrate that the proposed frameworks achieve a better trade-off between accuracy and inference complexity, consistently outperforming strong baselines. We further analyze the trade-offs between the two frameworks in terms of accuracy and efficiency, offering insights into their use across different scenarios.