P4OMP: Retrieval-Augmented Prompting for OpenMP Parallelism in Serial Code

作者: Wali Mohammad Abdullah, Azmain Kabir

分类: cs.SE, cs.AI

发布日期: 2025-06-28

💡 一句话要点

提出P4OMP以解决串行代码并行化问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: OpenMP 并行计算 代码生成 检索增强生成 高性能计算 C++ 机器学习

📋 核心要点

现有方法在将串行C/C++代码转换为OpenMP并行代码时，常常面临语法错误和指令不当组合的问题。
P4OMP通过检索增强生成技术，结合OpenMP教程中的知识，提供了一种无需微调的并行代码生成方法。
实验结果表明，P4OMP在108个真实C++程序中实现了100%的编译成功率，显著优于基线模型的表现。

📝 摘要（中文）

我们提出了P4OMP，这是一个基于检索增强的框架，旨在将串行C/C++代码转换为带有OpenMP注释的并行代码，利用大型语言模型（LLMs）。据我们所知，这是第一个在不进行模型微调或编译器工具插入的情况下，应用基于检索的提示来确保OpenMP指令的正确性。P4OMP利用检索增强生成（RAG）和OpenMP教程中的结构化指令知识，提高了提示驱动代码生成的可靠性。通过将生成内容与检索到的上下文相结合，P4OMP在语法正确性方面优于基线模型GPT-3.5-Turbo。我们在108个真实世界的C++程序上对P4OMP进行了评估，结果显示P4OMP在所有可并行化的案例中实现了100%的编译成功率，而基线模型在108个案例中有20个未能编译成功。

🔬 方法详解

问题定义：本论文旨在解决将串行C/C++代码转换为OpenMP并行代码时的语法错误和指令不当组合等问题。现有方法往往依赖于模型微调或编译器工具插入，导致效率低下和可靠性不足。

核心思路：P4OMP的核心思路是利用检索增强生成（RAG）技术，通过结合OpenMP教程中的结构化知识，提升提示驱动代码生成的准确性和可靠性。这样的设计使得生成的代码在语法和指令使用上更加正确。

技术框架：P4OMP的整体架构包括检索模块、生成模块和验证模块。首先，检索模块从知识库中获取相关的OpenMP示例和教程，然后生成模块基于检索到的上下文生成并行代码，最后验证模块确保生成代码的正确性和可编译性。

关键创新：P4OMP的主要创新在于首次将检索增强的提示应用于OpenMP指令的正确性检查，避免了传统方法中常见的语法错误和指令组合不当的问题。与现有方法相比，P4OMP无需进行模型微调，降低了使用门槛。

关键设计：在设计中，P4OMP采用了结构化的检索策略，确保检索到的上下文与生成任务高度相关。此外，生成模块的参数设置经过精心调整，以优化生成代码的质量和编译成功率。

📊 实验亮点

P4OMP在108个真实C++程序的测试中实现了100%的编译成功率，而基线模型GPT-3.5-Turbo在20个案例中未能成功编译，显示出P4OMP在语法正确性和指令使用上的显著优势。此外，P4OMP在七个计算密集型基准测试中表现出强大的运行时扩展性。

🎯 应用场景

P4OMP的研究成果在高性能计算（HPC）领域具有广泛的应用潜力，尤其是在需要将现有串行代码高效并行化的场景中。它可以帮助开发者快速将传统代码转化为并行代码，从而提升计算效率，降低开发成本，推动科学计算和工程应用的发展。

📄 摘要（原文）

We present P4OMP, a retrieval-augmented framework for transforming serial C/C++ code into OpenMP-annotated parallel code using large language models (LLMs). To our knowledge, this is the first system to apply retrieval-based prompting for OpenMP pragma correctness without model fine-tuning or compiler instrumentation. P4OMP leverages Retrieval-Augmented Generation (RAG) with structured instructional knowledge from OpenMP tutorials to improve the reliability of prompt-driven code generation. By grounding generation in the retrieved context, P4OMP improves syntactic correctness compared to baseline prompting with GPT-3.5-Turbo. We evaluate P4OMP against a baseline, GPT-3.5-Turbo without retrieval, on a comprehensive benchmark of 108 real-world C++ programs drawn from Stack Overflow, PolyBench, and NAS benchmark suites. P4OMP achieves 100% compilation success on all parallelizable cases, while the baseline fails to compile in 20 out of 108 cases. Six cases that rely on non-random-access iterators or thread-unsafe constructs are excluded due to fundamental OpenMP limitations. A detailed analysis demonstrates how P4OMP consistently avoids scoping errors, syntactic misuse, and invalid directive combinations that commonly affect baseline-generated code. We further demonstrate strong runtime scaling across seven compute-intensive benchmarks on an HPC cluster. P4OMP offers a robust, modular pipeline that significantly improves the reliability and applicability of LLM-generated OpenMP code.

P4OMP: Retrieval-Augmented Prompting for OpenMP Parallelism in Serial Code

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册