Automated Conjecture Resolution with Formal Verification

作者: Haocheng Ju, Guoxiong Gao, Jiedong Jiang, Bin Wu, Zeming Sun, Leheng Chen, Yutong Wang, Yuefeng Wang, Zichen Wang, Wanyi He, Peihao Wu, Liang Xiao, Ruochuan Liu, Bryan Dai, Bin Dong

分类: cs.LG, cs.AI

发布日期: 2026-04-07

💡 一句话要点

提出Rethlas-Archon框架，结合非形式化推理与形式化验证，自动解决并验证数学难题。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 形式化验证 自动化定理证明 大型语言模型 数学推理 人机协作

📋 核心要点

现有大语言模型在数学推理方面取得进展，但自然语言推理的模糊性导致可靠验证仍具挑战。
提出Rethlas-Archon框架，结合非形式化推理和形式化验证，实现端到端自动问题求解。
成功解决交换代数开放问题，并在Lean 4中完成形式化验证，证明框架的有效性。

📝 摘要（中文）

本文提出了一个自动化的框架，用于解决研究级别的数学问题，该框架集成了自然语言推理与形式化验证，从而能够在最少的人工干预下实现端到端的问题解决。该框架包含两个组件：非形式化推理代理Rethlas和形式化验证代理Archon。Rethlas通过结合推理原语和定理搜索引擎Matlas，模拟人类数学家的工作流程，探索解决方案策略并构建候选证明。Archon配备了形式化定理搜索引擎LeanSearch，通过结构化任务分解、迭代改进和自动证明合成，将非形式化论证转换为形式化的Lean 4项目，确保机器可检查的正确性。使用该框架，我们自动解决了一个交换代数中的开放问题，并在Lean 4中正式验证了结果证明，基本上没有人为参与。实验表明，强大的定理检索工具能够发现和应用跨领域的数学技术，而形式化代理能够自主地填补非形式化论证中的重要空白。更广泛地说，我们的工作展示了一种有前景的数学研究范式，其中非形式化和形式化推理系统配备定理检索工具协同工作，以产生可验证的结果，大幅减少人工工作量，并提供人机协作数学研究的具体实例。

🔬 方法详解

问题定义：当前大型语言模型在数学推理方面取得了显著进展，但由于自然语言本身的模糊性，如何可靠地解决和验证研究级别的数学问题仍然是一个巨大的挑战。现有的方法往往依赖人工干预，效率低下且容易出错。

核心思路：本文的核心思路是将非形式化的自然语言推理与形式化的机器验证相结合。通过模拟人类数学家的工作流程，利用大型语言模型进行初步的推理和证明构建，然后通过形式化验证工具对结果进行严格的验证，从而提高数学问题解决的可靠性和效率。

技术框架：该框架包含两个主要组件：Rethlas（非形式化推理代理）和Archon（形式化验证代理）。Rethlas负责进行非形式化的推理和证明搜索，它结合了推理原语和定理搜索引擎Matlas，用于探索解决方案策略并构建候选证明。Archon则负责将Rethlas生成的非形式化论证转换为形式化的Lean 4项目，它配备了形式化定理搜索引擎LeanSearch，通过结构化任务分解、迭代改进和自动证明合成，确保机器可检查的正确性。两个组件协同工作，形成一个完整的自动化问题解决流程。

关键创新：该框架的关键创新在于将非形式化推理和形式化验证紧密结合，实现了端到端的自动化数学问题解决。Rethlas利用大型语言模型的推理能力进行初步的探索和证明构建，而Archon则利用形式化验证工具对结果进行严格的验证，从而避免了人工干预，提高了问题解决的可靠性和效率。此外，框架中使用的定理搜索引擎Matlas和LeanSearch能够有效地检索相关的数学知识，为问题解决提供了强大的支持。

关键设计：Rethlas的设计重点在于模拟人类数学家的推理过程，它通过结合推理原语和定理搜索引擎，能够有效地探索解决方案策略并构建候选证明。Archon的设计重点在于将非形式化论证转换为形式化的Lean 4项目，它通过结构化任务分解、迭代改进和自动证明合成，确保机器可检查的正确性。具体的参数设置、损失函数和网络结构等技术细节在论文中没有详细描述，属于未知信息。

🖼️ 关键图片

📊 实验亮点

该框架成功自动解决了一个交换代数中的开放问题，并在Lean 4中完成了形式化验证，基本上没有人为参与。实验结果表明，强大的定理检索工具能够发现和应用跨领域的数学技术，而形式化代理能够自主地填补非形式化论证中的重要空白。具体的性能数据和提升幅度在论文中没有明确给出。

🎯 应用场景

该研究成果可应用于自动化定理证明、数学知识发现、智能教育等领域。通过人机协作，可以加速数学研究进程，降低人工成本，并为数学教育提供新的工具和方法。未来，该框架有望扩展到其他科学领域，促进科学研究的自动化和智能化。

📄 摘要（原文）

Recent advances in large language models have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains challenging due to the inherent ambiguity of natural language reasoning. In this paper, we propose an automated framework for tackling research-level mathematical problems that integrates natural language reasoning with formal verification, enabling end-to-end problem solving with minimal human intervention. Our framework consists of two components: an informal reasoning agent, Rethlas, and a formal verification agent, Archon. Rethlas mimics the workflow of human mathematicians by combining reasoning primitives with our theorem search engine, Matlas, to explore solution strategies and construct candidate proofs. Archon, equipped with our formal theorem search engine LeanSearch, translates informal arguments into formalized Lean 4 projects through structured task decomposition, iterative refinement, and automated proof synthesis, ensuring machine-checkable correctness. Using this framework, we automatically resolve an open problem in commutative algebra and formally verify the resulting proof in Lean 4 with essentially no human involvement. Our experiments demonstrate that strong theorem retrieval tools enable the discovery and application of cross-domain mathematical techniques, while the formal agent is capable of autonomously filling nontrivial gaps in informal arguments. More broadly, our work illustrates a promising paradigm for mathematical research in which informal and formal reasoning systems, equipped with theorem retrieval tools, operate in tandem to produce verifiable results, substantially reduce human effort, and offer a concrete instantiation of human-AI collaborative mathematical research.

Automated Conjecture Resolution with Formal Verification

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理