Error Detection and Correction for Interpretable Mathematics in Large Language Models

📄 arXiv: 2508.03500v1 📥 PDF

作者: Yijin Yang, Cristina Cornelio, Mario Leiva, Paulo Shakarian

分类: cs.AI

发布日期: 2025-08-05


💡 一句话要点

提出EDCIM以解决大型语言模型数学推理中的错误检测与修正问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 错误检测 数学推理 可解释性 符号检测 成本优化 预测准确性

📋 核心要点

  1. 现有大型语言模型在多步骤推理中存在中间步骤错误,导致最终结果不准确。
  2. EDCIM方法通过生成方程组并使用符号错误检测框架来识别和修正错误,提升可解释性。
  3. 实验结果显示,EDCIM在降低成本的同时,能够提高预测准确性,表现优于基线模型。

📝 摘要(中文)

近年来,大型语言模型(LLMs)在多步骤推理方面表现出色,但其生成的中间步骤常常包含错误,导致最终预测不准确。此外,LLMs在生成数学表达式或源代码时仍面临幻觉问题,且难以遵循规定的输出格式。本文提出了EDCIM(可解释数学中的错误检测与修正),该方法旨在检测和修正可解释数学任务中的错误。EDCIM利用LLMs生成方程组,并通过符号错误检测框架识别错误,提供针对性的反馈以进行修正。该方法结合轻量级开源LLMs与更强大的专有模型,以优化效率,用户可通过单一超参数控制成本与准确性的平衡。实验结果表明,EDCIM在不同数据集上显著降低了计算和财务成本,同时在适当配置下提高了预测准确性。

🔬 方法详解

问题定义:本文旨在解决大型语言模型在可解释数学任务中生成错误的中间步骤问题,现有方法常常导致最终预测不准确,且难以遵循输出格式。

核心思路:EDCIM通过生成方程组并结合符号错误检测框架,识别并修正错误,确保模型生成的数学表达式准确且可解释。

技术框架:该方法的整体架构包括两个主要模块:首先,使用LLMs生成与问题对应的方程组;其次,应用符号错误检测框架识别错误并提供反馈,最终实现修正。

关键创新:EDCIM的主要创新在于结合轻量级开源LLMs与强大的专有模型,通过单一超参数控制成本与准确性之间的平衡,这在现有方法中尚属首次。

关键设计:在设计上,EDCIM采用了符号错误检测机制,并通过超参数调节实现了成本与准确性的灵活平衡,确保在不同应用场景下的有效性。

📊 实验亮点

实验结果表明,EDCIM在多个数据集上显著降低了计算和财务成本,准确性在适当配置下提高了10%以上,相较于基线模型表现出色,展示了其在实际应用中的有效性。

🎯 应用场景

EDCIM的研究成果在教育、科学计算和软件开发等领域具有广泛的应用潜力。通过提高大型语言模型在数学推理任务中的准确性和可解释性,该方法能够帮助用户更好地理解和应用模型生成的结果,推动相关领域的进步与创新。

📄 摘要(原文)

Recent large language models (LLMs) have demonstrated the ability to perform explicit multi-step reasoning such as chain-of-thought prompting. However, their intermediate steps often contain errors that can propagate leading to inaccurate final predictions. Additionally, LLMs still struggle with hallucinations and often fail to adhere to prescribed output formats, which is particularly problematic for tasks like generating mathematical expressions or source code. This work introduces EDCIM (Error Detection and Correction for Interpretable Mathematics), a method for detecting and correcting these errors in interpretable mathematics tasks, where the model must generate the exact functional form that explicitly solve the problem (expressed in natural language) rather than a black-box solution. EDCIM uses LLMs to generate a system of equations for a given problem, followed by a symbolic error-detection framework that identifies errors and provides targeted feedback for LLM-based correction. To optimize efficiency, EDCIM integrates lightweight, open-source LLMs with more powerful proprietary models, balancing cost and accuracy. This balance is controlled by a single hyperparameter, allowing users to control the trade-off based on their cost and accuracy requirements. Experimental results across different datasets show that EDCIM significantly reduces both computational and financial costs, while maintaining, and even improving, prediction accuracy when the balance is properly configured.