LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning

作者: Yubin Wu, Zicheng Cai, Liping Ning, Hua Wang, Zhi Chen, Yaohua Tang, Hao Chen

分类: cs.AI, cs.LG

发布日期: 2026-05-08

💡 一句话要点

提出LiteGUI框架：通过强化学习与多解蒸馏提升轻量级GUI智能体性能

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture)

关键词: GUI智能体 知识蒸馏 强化学习 端侧模型 多模态交互 策略优化

📋 核心要点

现有轻量级GUI智能体受限于模型容量，传统的监督微调方法易引发过拟合、灾难性遗忘及策略僵化，难以满足复杂交互需求。
提出一种SFT-free训练范式，通过引导式在线策略蒸馏与多解双层GRPO框架，实现宏观规划与微观执行的深度对齐。
实验证明该方法显著提升了2B/3B规模模型的性能，在多个基准测试中达到SOTA，并展现出超越传统模仿学习的潜力。

📝 摘要（中文）

开发轻量级、端侧视觉语言GUI智能体对于实现高效的跨平台自动化交互至关重要。然而，当前的端侧智能体受限于模型容量，性能提升面临瓶颈。传统的小模型监督微调（SFT）常导致过拟合、灾难性遗忘及策略僵化，难以应对复杂任务。为此，本文提出了一种无需SFT的训练范式。首先，通过引导式在线策略蒸馏（Guided On-policy Distillation）将广义知识蒸馏引入GUI领域，结合预言机参考轨迹与动态检索机制，有效减少幻觉并缓解多解任务中的认知偏差。在此基础上，引入多解双层GRPO框架，实现宏观子任务规划与微观执行匹配的对齐，增强长程任务的探索能力。此外，构建了自动化数据生成流水线以合成带有丰富多解标注的轨迹。实验表明，该方法使2B/3B规模模型达到SOTA水平，性能足以媲美大规模模型。

🔬 方法详解

问题定义：论文旨在解决轻量级GUI智能体在资源受限环境下的性能瓶颈。现有方法依赖SFT，导致模型在处理长程、多解GUI任务时出现策略僵化、幻觉严重及泛化能力不足的问题。

核心思路：摒弃传统的SFT范式，转而采用强化学习驱动的在线策略蒸馏。通过引入“多解”概念，承认GUI任务存在多种可行路径，利用预言机（Oracle）轨迹引导模型探索，从而提升模型在复杂场景下的决策灵活性。

技术框架：整体架构包含三个核心模块：一是引导式在线策略蒸馏，利用动态检索机制获取参考轨迹；二是多解双层GRPO框架，分别在宏观规划层和微观执行层进行对齐优化；三是自动化数据生成流水线，用于构建高质量、多解标注的训练数据集。

关键创新：首次将广义知识蒸馏系统性地应用于GUI智能体领域，并提出了双层GRPO（Group Relative Policy Optimization）机制，解决了长程任务中规划与执行不匹配的难题。

关键设计：采用动态检索机制缓解认知偏差；通过GRPO算法优化策略分布，在保持计算效率的同时，利用多解标注引导模型在策略空间内进行更有效的探索，从而突破小模型的能力上限。

🖼️ 关键图片

📊 实验亮点

实验结果显示，LiteGUI在多个主流GUI基准测试中表现优异，2B/3B规模的模型性能显著超越了传统的模仿学习基线，甚至在多项指标上与参数量远超其规模的大型模型持平。消融实验进一步证实，结构化的在线策略蒸馏与双层探索机制是解锁小模型潜力的关键，有效克服了传统训练范式的性能天花板。

🎯 应用场景

该研究主要应用于移动端与桌面端的自动化交互场景，如自动化测试、智能助手、无障碍辅助工具及跨平台任务自动化。其轻量化特性使其非常适合部署在手机、平板等边缘设备上，为用户提供低延迟、高精度的自动化操作支持，具有极高的商业落地价值。

📄 摘要（原文）

Developing lightweight, on-device vision-language GUI agents is essential for efficient cross-platform automated interaction. However, current on-device agents are constrained by limited model capacity, and further performance improvements remain urgently needed. Traditional Supervised Fine-Tuning (SFT) for small-scale models often leads to overfitting, catastrophic forgetting and policy rigidity, and thus fails to fully address these challenges. In this work, we propose a novel SFT-free training paradigm that significantly enhances the performance of small-scale models. We first present the initial systematic integration of generalized knowledge distillation into the GUI agent domain via Guided On-policy Distillation. By incorporating oracle reference trajectories together with a dynamic retrieval mechanism, our method reduces hallucinations and mitigates the cognitive misalignment inherent in multi-solution GUI tasks. Building on this foundation, we further introduce a Multi-solution Dual-level GRPO framework that jointly aligns macro-level subtask planning with micro-level execution matching, thereby improving exploration in long-horizon GUI agent scenarios. In addition, we construct an automated data generation pipeline to synthesize GUI task trajectories with rich multi-solution annotations. Extensive experiments show that our method achieves state-of-the-art performance among lightweight models while remaining competitive with substantially larger-scale models across all benchmarks. Ablation studies further demonstrate that structured on-policy distillation and multi-solution dual-level exploration can fully unlock the capabilities of 2B/3B scale agents, surpassing the performance limits of conventional imitation learning.

LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理