Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

作者: Jingchang Chen, Hongxuan Tang, Zheng Chu, Qianglong Chen, Zekun Wang, Ming Liu, Bing Qin

分类: cs.CL, cs.SE

发布日期: 2024-05-30 (更新: 2024-11-03)

备注: NeurIPS 2024 oral

💡 一句话要点

提出FunCoder以解决复杂代码生成问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 代码生成 分治策略 功能共识 复杂需求 自我改进 机器学习 软件开发

📋 核心要点

现有代码生成方法在处理复杂需求时存在困难，尤其是在准确规划和自我测试方面。
本文提出FunCoder框架，通过分治策略和功能共识来简化代码生成过程，递归分解为子功能。
实验结果显示，FunCoder在多个基准测试中表现优异，StableCode-3b在HumanEval上超越GPT-3.5达18.6%。

📝 摘要（中文）

尽管大型语言模型在代码生成方面取得了进展，但在满足复杂需求的程序生成上仍面临挑战。现有方法利用计划与解决的分解策略来降低复杂性，并通过自测试来优化生成的程序。然而，提前规划深层需求具有挑战性，且测试的准确性对自我改进至关重要。为此，本文提出了FunCoder，一个结合分治策略与功能共识的代码生成框架。FunCoder通过递归分支子功能作为代码生成过程中的小目标，并通过树状层次结构表示。这些子功能被组合以实现更复杂的目标。此外，通过识别程序行为的相似性形成的共识来指定功能，从而减轻错误传播。FunCoder在HumanEval、MBPP、xCodeEval和MATH上平均超越了最先进的方法9.8%。

🔬 方法详解

问题定义：本文旨在解决大型语言模型在代码生成中处理复杂需求的困难，现有方法在需求规划和自我测试的准确性上存在不足。

核心思路：FunCoder通过分治策略将复杂问题递归分解为多个子功能，利用功能共识来减少错误传播，提升代码生成的准确性和效率。

技术框架：FunCoder的整体架构包括需求分析、功能分解、功能共识形成和代码合成四个主要模块。首先分析需求，然后递归分解为子功能，接着通过共识机制识别相似功能，最后合成生成最终代码。

关键创新：FunCoder的核心创新在于动态功能分解和功能共识机制，这与传统的自测试方法相比，能够更有效地处理复杂需求并提高生成代码的正确性。

关键设计：在FunCoder中，参数设置和损失函数的设计旨在优化功能共识的形成，确保子功能的准确性和有效组合，具体的网络结构细节未在摘要中详细说明。

🖼️ 关键图片

📊 实验亮点

FunCoder在HumanEval、MBPP、xCodeEval和MATH基准测试中平均超越了最先进的方法9.8%。在使用StableCode-3b时，FunCoder在HumanEval上超越了GPT-3.5达18.6%，并在性能上达到了GPT-4的97.7%。

🎯 应用场景

FunCoder框架可广泛应用于软件开发、自动化测试和智能编程助手等领域，能够帮助开发者更高效地生成满足复杂需求的代码。未来，该技术有望推动代码生成的智能化进程，提升软件开发的整体效率和质量。

📄 摘要（原文）

Despite recent progress made by large language models in code generation, they still struggle with programs that meet complex requirements. Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Yet, planning deep-inside requirements in advance can be challenging, and the tests need to be accurate to accomplish self-improvement. To this end, we propose FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus. Specifically, FunCoder recursively branches off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives. Additionally, we designate functions via a consensus formed by identifying similarities in program behavior, mitigating error propagation. FunCoder outperforms state-of-the-art methods by +9.8% on average in HumanEval, MBPP, xCodeEval and MATH with GPT-3.5 and GPT-4. Moreover, our method demonstrates superiority on smaller models: With FunCoder, StableCode-3b surpasses GPT-3.5 by +18.6% and achieves 97.7% of GPT-4's performance on HumanEval. Further analysis reveals that our proposed dynamic function decomposition is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理