EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

作者: Tianshu Zhang, Kun Qian, Siddhartha Sahai, Yuan Tian, Shaddy Garg, Huan Sun, Yunyao Li

分类: cs.DB, cs.AI, cs.CL, cs.LG

发布日期: 2026-03-11

备注: Accepted by VLDB 2025

💡 一句话要点

EvoSchema：针对数据库模式演变的文本到SQL鲁棒性评测基准

🎯 匹配领域: 支柱三：空间感知与语义 (Perception & Semantics)

关键词: 文本到SQL 模式演变 鲁棒性 数据库 自然语言处理 基准测试 LLM

📋 核心要点

现有文本到SQL模型在模式演变时性能显著下降，缺乏针对模式动态性的鲁棒性。
EvoSchema通过构建包含十种扰动类型的模式演变基准，系统评估和提升模型鲁棒性。
实验表明表级扰动影响更大，且在EvoSchema上训练的模型具有更强的抗扰动能力。

📝 摘要（中文）

神经文本到SQL模型在将自然语言问题(NLQ)转化为SQL查询方面取得了显著进展。然而，数据库模式经常演变以满足新的需求，导致在静态模式上训练的模型性能下降。现有工作主要集中在简单地解释NLQ、数据库和SQL之间的句法或语义映射，或者缺乏全面和可控的方式来研究模式演变下的模型鲁棒性问题，这不足以应对现实中日益复杂和丰富的数据库模式变化，尤其是在LLM时代。为了应对模式演变带来的挑战，我们提出了EvoSchema，这是一个全面的基准，旨在评估和增强文本到SQL系统在真实模式变化下的鲁棒性。EvoSchema引入了一种新的模式演变分类法，包含列级和表级修改的十种扰动类型，系统地模拟了数据库模式的动态特性。通过EvoSchema，我们对不同的开源和闭源LLM进行了深入评估，揭示了表级扰动对模型性能的影响远大于列级变化。此外，EvoSchema激发了更具弹性的文本到SQL系统的开发，包括模型训练和数据库设计。在EvoSchema的多样化模式设计上训练的模型可以迫使模型区分相同问题的模式差异，从而避免学习虚假模式，与在未扰动数据上训练的模型相比，表现出显著的鲁棒性。该基准为模型行为提供了有价值的见解，并为设计能够在动态、真实环境中蓬勃发展的系统提供了一条前进的道路。

🔬 方法详解

问题定义：现有文本到SQL模型在静态数据库模式上训练，当数据库模式发生演变（例如添加、删除或修改表和列）时，模型性能会显著下降。现有的方法要么侧重于简单的释义，要么缺乏对模式演变进行全面和可控的研究，无法有效应对现实世界中复杂多变的数据库模式。

核心思路：EvoSchema的核心思路是构建一个包含多种模式演变类型的基准数据集，通过系统地引入不同类型的扰动来评估和提升文本到SQL模型的鲁棒性。该基准旨在模拟真实世界中数据库模式的动态变化，并为开发更具弹性的文本到SQL系统提供支持。

技术框架：EvoSchema基准的核心是其模式演变分类法，该分类法定义了十种不同类型的扰动，包括列级（例如重命名、数据类型更改）和表级（例如添加、删除表）的修改。研究人员可以使用EvoSchema来评估现有文本到SQL模型在不同扰动下的性能，并开发新的训练策略或模型架构来提高鲁棒性。整体流程包括：1）定义模式演变类型；2）生成扰动后的数据库模式；3）评估模型在原始和扰动模式上的性能。

关键创新：EvoSchema的关键创新在于其全面的模式演变分类法和系统化的评估方法。与现有工作相比，EvoSchema更关注真实世界中的数据库模式变化，并提供了一种可控的方式来研究模型在不同类型的扰动下的行为。此外，EvoSchema还激发了新的模型训练策略，例如在扰动数据上进行训练，以提高模型的鲁棒性。

关键设计：EvoSchema的设计考虑了多种因素，包括扰动类型的选择、扰动强度的控制以及评估指标的定义。扰动类型的选择基于对真实世界数据库模式变化的分析。扰动强度的控制旨在平衡扰动的真实性和评估的难度。评估指标包括SQL查询的准确率和执行效率。

🖼️ 关键图片

📊 实验亮点

实验结果表明，表级扰动对模型性能的影响远大于列级扰动。在EvoSchema上训练的模型与在未扰动数据上训练的模型相比，在面对模式演变时表现出显著的鲁棒性提升。具体性能提升数据未知，但整体趋势表明EvoSchema能够有效提升模型的抗扰动能力。

🎯 应用场景

EvoSchema的研究成果可应用于各种需要处理动态数据库模式的文本到SQL系统中，例如智能助手、数据分析平台和自动化报表生成工具。通过提高模型在模式演变下的鲁棒性，可以减少人工干预，提高系统的可靠性和可用性，并降低维护成本。该研究还有助于推动数据库设计和模型训练策略的改进。

📄 摘要（原文）

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理