CoopQ: Cooperative Game Inspired Layerwise Mixed Precision Quantization for LLMs

作者: Junchen Zhao, Ali Derakhshan, Jayden Kana Hyman, Junhao Dong, Sangeetha Abdu Jyothi, Ian Harris

分类: cs.LG

发布日期: 2025-09-18 (更新: 2025-12-12)

💡 一句话要点

提出CoopQ以解决LLMs低资源部署中的混合精度量化问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 混合精度量化 大型语言模型 合作博弈 Shapley值 模型压缩 低资源部署 量化估计 优化算法

📋 核心要点

现有混合精度量化方法在平均精度低于四位时表现不佳，无法有效处理层间相互作用。
本文提出将混合精度量化视为层之间的合作博弈，并引入SPQE以获得层敏感性估计。
实验结果显示，CoopQ在多个模型上相较于最佳基线减少了20%至80%的困惑度，随着位宽收紧，提升幅度更大。

📝 摘要（中文）

大型语言模型（LLMs）展现出令人印象深刻的能力，但其数十亿参数的规模使得在设备上或低资源环境中部署变得困难。混合精度量化提供了一个有效的解决方案，但现有方法在平均精度低于四位时表现不佳，因为它们依赖于孤立的层特定指标，忽视了影响整体性能的层间相互作用。为了解决这些局限性，本文将混合精度量化问题框架化为层之间的合作博弈，并引入基于Shapley的渐进量化估计（SPQE），以高效获得层敏感性和层间相互作用的准确Shapley估计。基于SPQE估计，提出了受合作博弈启发的混合精度量化（CoopQ），将这些Shapley估计转化为二次二次优化公式，在严格的内存约束下为层分配2或4位精度。综合实验表明，CoopQ在多个模型上表现出色，且相较于仅依赖孤立指标的方法具有更好的可扩展性和性能。

🔬 方法详解

问题定义：本文旨在解决大型语言模型在低资源环境下的混合精度量化问题。现有方法在平均精度低于四位时，因依赖孤立的层特定指标而无法有效捕捉层间的相互作用，导致性能下降。

核心思路：论文的核心思路是将混合精度量化问题视为层之间的合作博弈，通过引入Shapley值来量化层的敏感性和层间相互作用，从而实现更有效的量化决策。

技术框架：整体架构包括两个主要模块：首先是Shapley-based Progressive Quantization Estimation (SPQE)，用于估计层的敏感性；其次是Cooperative Game Inspired Mixed-Precision Quantization (CoopQ)，将这些估计转化为二次优化问题，以在内存约束下为层分配精度。

关键创新：最重要的技术创新在于将混合精度量化问题框架化为合作博弈，并引入Shapley值的估计，这与现有方法的孤立指标方法形成了本质区别。

关键设计：在设计中，采用了二次二次优化公式来处理层的精度分配问题，确保在严格的内存限制下实现最佳性能。

🖼️ 关键图片

📊 实验亮点

实验结果表明，CoopQ在Llama-3、Gemma-2和Qwen-3模型上表现优异，相较于最佳基线在平均精度为2至4位时，困惑度减少了20%至80%。随着位宽的收紧，性能提升幅度进一步扩大，显示出其优越的可扩展性和有效性。

🎯 应用场景

该研究的潜在应用领域包括大型语言模型在移动设备、边缘计算和其他低资源环境中的部署。通过有效的混合精度量化，能够在保证模型性能的同时，显著降低内存占用和计算需求，推动AI技术的普及与应用。

📄 摘要（原文）

Large Language Models (LLMs) promise impressive capabilities, yet their multi-billion-parameter scale makes on-device or low-resource deployment prohibitive. Mixed-precision quantization offers a compelling solution, but existing methods struggle when the average precision drops below four bits, as they rely on isolated, layer-specific metrics that overlook critical inter-layer interactions affecting overall performance. To address these limitations, we first frame the mixed-precision quantization problem as a cooperative game among layers and introduce Shapley-based Progressive Quantization Estimation (SPQE) to efficiently obtain accurate Shapley estimates of layer sensitivities and inter-layer interactions. Leveraging the SPQE estimates, we propose Cooperative Game Inspired Mixed-Precision Quantization (CoopQ) which translates these Shapley estimates into a binary quadratic optimization formulation, assigning either 2 or 4-bit precision to layers under strict memory constraints. Comprehensive experiments conducted on Llama-3, Gemma-2, and Qwen-3 models across three independent PTQ backends (Quanto, HQQ, GPTQ) demonstrate CoopQ's scalability and consistently superior performance compared to methods relying solely on isolated metrics. Across average precisions spanning 4 bit down to 2 bit, CoopQ cuts Perplexity by 20 - 80 % relative to the best baseline, with the margin growing as the bit-width tightens.

CoopQ: Cooperative Game Inspired Layerwise Mixed Precision Quantization for LLMs

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理