Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues

作者: Gideon Peters, SayedHassan Khatoonabadi, Emad Shihab

分类: cs.SE, cs.AI

发布日期: 2026-01-09

备注: Accepted to the The ACM International Conference on Mining Software Repositories (MSR) (MSR 2026)

💡 一句话要点

评估大型语言模型在自动化DOM层级Web性能问题解决中的应用

🎯 匹配领域: 支柱一：机器人控制 (Robot Control) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: Web性能优化 大型语言模型 DOM操作 自动化 性能审计

📋 核心要点

Web性能优化面临手动操作耗时长的挑战，尤其是在DOM修改方面。
利用大型语言模型（LLM）自动化DOM修改，以提升Web性能。
实验评估了九种LLM在解决Web性能问题上的有效性，发现性能差异显著。

📝 摘要（中文）

用户对快速、无缝的网页体验需求日益增长，但开发者常常在有限的约束下难以满足这些期望。性能优化至关重要，但它既耗时又常常需要手动完成。文档对象模型（DOM）的修改是其中最复杂的任务之一，因此本研究聚焦于此。大型语言模型（LLM）的最新进展为自动化这一复杂任务提供了一条有希望的途径，有可能改变开发者解决Web性能问题的方式。本研究评估了九个最先进的LLM在自动化Web性能问题解决方面的有效性。为此，我们首先提取了15个热门网页（例如Facebook）的DOM树，然后使用Lighthouse检索其性能审计报告。随后，我们将提取的DOM树和相应的审计报告传递给每个模型进行解决。我们的研究考虑了7个独特的审计类别，结果表明LLM在SEO和可访问性问题上普遍表现出色。然而，它们在性能关键型DOM操作中的有效性参差不齐。虽然像GPT-4.1这样的高性能模型在初始加载、交互性和网络优化等领域实现了显著的降低（例如，审计发生率降低了46.52%到48.68%），但其他模型，如GPT-4o-mini，表现明显不佳。

🔬 方法详解

问题定义：论文旨在解决Web性能优化中，手动修改DOM结构耗时且容易出错的问题。现有方法依赖人工分析和修改，效率低下且难以规模化应用。

核心思路：利用大型语言模型（LLM）的理解和生成能力，将DOM树和性能审计报告作为输入，让LLM自动生成优化的DOM结构。核心在于将性能优化问题转化为LLM可以理解和解决的文本生成任务。

技术框架：整体流程包括：1) 提取目标网页的DOM树；2) 使用Lighthouse等工具生成性能审计报告；3) 将DOM树和审计报告作为prompt输入到LLM中；4) LLM生成优化的DOM树；5) 评估优化后的DOM树的性能指标。研究对比了九种不同的LLM模型。

关键创新：该研究的关键创新在于将LLM应用于自动化DOM层级的Web性能优化。不同于传统的基于规则或启发式的优化方法，该方法利用LLM的理解和生成能力，能够处理更复杂的优化场景。

关键设计：研究中，prompt的设计至关重要，需要清晰地描述问题和期望的优化目标。此外，对LLM生成的DOM树进行后处理，例如验证其有效性和一致性，也是关键的设计环节。具体参数设置和损失函数未知。

📊 实验亮点

实验结果表明，高性能LLM如GPT-4.1在初始加载、交互性和网络优化等关键性能指标上实现了显著降低，审计发生率降低了46.52%到48.68%。然而，不同LLM的性能差异显著，部分模型表现不佳。分析表明，LLM倾向于采用添加元素和改变元素位置的策略进行优化，但可能导致视觉稳定性下降。

🎯 应用场景

该研究成果可应用于自动化Web性能优化工具，帮助开发者快速定位和解决DOM层级的性能问题。通过集成到CI/CD流程中，可以实现持续的Web性能监控和优化，提升用户体验，降低服务器负载，并改善SEO排名。未来可扩展到移动应用和富客户端应用。

📄 摘要（原文）

Users demand fast, seamless webpage experiences, yet developers often struggle to meet these expectations within tight constraints. Performance optimization, while critical, is a time-consuming and often manual process. One of the most complex tasks in this domain is modifying the Document Object Model (DOM), which is why this study focuses on it. Recent advances in Large Language Models (LLMs) offer a promising avenue to automate this complex task, potentially transforming how developers address web performance issues. This study evaluates the effectiveness of nine state-of-the-art LLMs for automated web performance issue resolution. For this purpose, we first extracted the DOM trees of 15 popular webpages (e.g., Facebook), and then we used Lighthouse to retrieve their performance audit reports. Subsequently, we passed the extracted DOM trees and corresponding audits to each model for resolution. Our study considers 7 unique audit categories, revealing that LLMs universally excel at SEO & Accessibility issues. However, their efficacy in performance-critical DOM manipulations is mixed. While high-performing models like GPT-4.1 delivered significant reductions in areas like Initial Load, Interactivity, and Network Optimization (e.g., 46.52% to 48.68% audit incidence reductions), others, such as GPT-4o-mini, notably underperformed, consistently. A further analysis of these modifications showed a predominant additive strategy and frequent positional changes, alongside regressions particularly impacting Visual Stability.

Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理