LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
作者: Yubin Wu, Zicheng Cai, Liping Ning, Hua Wang, Zhi Chen, Yaohua Tang, Hao Chen
分类: cs.AI, cs.LG
发布日期: 2026-05-08
💡 一句话要点
提出LiteGUI框架:通过强化学习与多解蒸馏提升轻量级GUI智能体性能
🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture)
关键词: GUI智能体 知识蒸馏 强化学习 端侧模型 多模态交互 策略优化
📋 核心要点
- 现有轻量级GUI智能体受限于模型容量,传统的监督微调方法易引发过拟合、灾难性遗忘及策略僵化,难以满足复杂交互需求。
- 提出一种SFT-free训练范式,通过引导式在线策略蒸馏与多解双层GRPO框架,实现宏观规划与微观执行的深度对齐。
- 实验证明该方法显著提升了2B/3B规模模型的性能,在多个基准测试中达到SOTA,并展现出超越传统模仿学习的潜力。
📝 摘要(中文)
开发轻量级、端侧视觉语言GUI智能体对于实现高效的跨平台自动化交互至关重要。然而,当前的端侧智能体受限于模型容量,性能提升面临瓶颈。传统的小模型监督微调(SFT)常导致过拟合、灾难性遗忘及策略僵化,难以应对复杂任务。为此,本文提出了一种无需SFT的训练范式。首先,通过引导式在线策略蒸馏(Guided On-policy Distillation)将广义知识蒸馏引入GUI领域,结合预言机参考轨迹与动态检索机制,有效减少幻觉并缓解多解任务中的认知偏差。在此基础上,引入多解双层GRPO框架,实现宏观子任务规划与微观执行匹配的对齐,增强长程任务的探索能力。此外,构建了自动化数据生成流水线以合成带有丰富多解标注的轨迹。实验表明,该方法使2B/3B规模模型达到SOTA水平,性能足以媲美大规模模型。
🔬 方法详解
问题定义:论文旨在解决轻量级GUI智能体在资源受限环境下的性能瓶颈。现有方法依赖SFT,导致模型在处理长程、多解GUI任务时出现策略僵化、幻觉严重及泛化能力不足的问题。
核心思路:摒弃传统的SFT范式,转而采用强化学习驱动的在线策略蒸馏。通过引入“多解”概念,承认GUI任务存在多种可行路径,利用预言机(Oracle)轨迹引导模型探索,从而提升模型在复杂场景下的决策灵活性。
技术框架:整体架构包含三个核心模块:一是引导式在线策略蒸馏,利用动态检索机制获取参考轨迹;二是多解双层GRPO框架,分别在宏观规划层和微观执行层进行对齐优化;三是自动化数据生成流水线,用于构建高质量、多解标注的训练数据集。
关键创新:首次将广义知识蒸馏系统性地应用于GUI智能体领域,并提出了双层GRPO(Group Relative Policy Optimization)机制,解决了长程任务中规划与执行不匹配的难题。
关键设计:采用动态检索机制缓解认知偏差;通过GRPO算法优化策略分布,在保持计算效率的同时,利用多解标注引导模型在策略空间内进行更有效的探索,从而突破小模型的能力上限。
🖼️ 关键图片
📊 实验亮点
实验结果显示,LiteGUI在多个主流GUI基准测试中表现优异,2B/3B规模的模型性能显著超越了传统的模仿学习基线,甚至在多项指标上与参数量远超其规模的大型模型持平。消融实验进一步证实,结构化的在线策略蒸馏与双层探索机制是解锁小模型潜力的关键,有效克服了传统训练范式的性能天花板。
🎯 应用场景
该研究主要应用于移动端与桌面端的自动化交互场景,如自动化测试、智能助手、无障碍辅助工具及跨平台任务自动化。其轻量化特性使其非常适合部署在手机、平板等边缘设备上,为用户提供低延迟、高精度的自动化操作支持,具有极高的商业落地价值。
📄 摘要(原文)
Developing lightweight, on-device vision-language GUI agents is essential for efficient cross-platform automated interaction. However, current on-device agents are constrained by limited model capacity, and further performance improvements remain urgently needed. Traditional Supervised Fine-Tuning (SFT) for small-scale models often leads to overfitting, catastrophic forgetting and policy rigidity, and thus fails to fully address these challenges. In this work, we propose a novel SFT-free training paradigm that significantly enhances the performance of small-scale models. We first present the initial systematic integration of generalized knowledge distillation into the GUI agent domain via Guided On-policy Distillation. By incorporating oracle reference trajectories together with a dynamic retrieval mechanism, our method reduces hallucinations and mitigates the cognitive misalignment inherent in multi-solution GUI tasks. Building on this foundation, we further introduce a Multi-solution Dual-level GRPO framework that jointly aligns macro-level subtask planning with micro-level execution matching, thereby improving exploration in long-horizon GUI agent scenarios. In addition, we construct an automated data generation pipeline to synthesize GUI task trajectories with rich multi-solution annotations. Extensive experiments show that our method achieves state-of-the-art performance among lightweight models while remaining competitive with substantially larger-scale models across all benchmarks. Ablation studies further demonstrate that structured on-policy distillation and multi-solution dual-level exploration can fully unlock the capabilities of 2B/3B scale agents, surpassing the performance limits of conventional imitation learning.