KEN: Kernel Extensions using Natural Language

作者: Yusheng Zheng, Yiwei Yang, Maolin Chen, Andrew Quinn

分类: cs.AI, cs.OS

发布日期: 2023-12-09

💡 一句话要点

KEN：利用自然语言扩展内核，简化eBPF程序开发。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: eBPF 内核扩展 自然语言编程 大型语言模型 程序合成

📋 核心要点

eBPF程序开发复杂，开发者需深入了解内核，并受限于eBPF验证器的控制流和数据访问限制。
KEN利用大型语言模型，通过自然语言提示合成eBPF程序，结合程序理解、符号执行和反馈循环。
实验表明，KEN在80%的情况下生成正确的eBPF程序，相比于LLM基线提升了2.67倍。

📝 摘要（中文）

修改和扩展操作系统对于提高系统的安全性、可靠性和性能至关重要。扩展的Berkeley数据包过滤器（eBPF）生态系统已成为扩展Linux内核的标准机制，并且最近已被移植到Windows。eBPF程序将新的逻辑注入到内核中，系统将在现有逻辑之前或之后执行这些逻辑。虽然eBPF生态系统为内核扩展提供了一种灵活的机制，但如今开发人员很难编写eBPF程序。eBPF开发人员必须深入了解操作系统的内部结构，以确定放置逻辑的位置，并应对eBPF验证器强制执行的对其eBPF程序的控制流和数据访问的编程限制。本文介绍了一种替代框架KEN，它允许使用自然语言编写内核扩展，从而减轻了编写eBPF程序的难度。KEN利用大型语言模型（LLM）的最新进展，根据用户的英语提示合成eBPF程序。为了确保LLM的输出在语义上等同于用户的提示，KEN结合了LLM驱动的程序理解、符号执行和一系列反馈循环。KEN的关键创新在于这些技术的结合。特别是，该系统以一种新颖的结构使用符号执行，使其能够结合程序合成和程序理解的结果，并建立在LLM最近在这些任务中各自取得的成功之上。为了评估KEN，我们开发了一个新的eBPF程序自然语言提示语料库。我们表明，KEN在80%的情况下生成正确的eBPF程序，与LLM驱动的程序合成基线相比，提高了2.67倍。

🔬 方法详解

问题定义：现有eBPF程序开发难度大，需要开发者具备深厚的内核知识，并且受到eBPF验证器的诸多限制，导致开发效率低下，容易出错。现有方法依赖于手动编写和调试eBPF代码，缺乏自动化和易用性。

核心思路：KEN的核心思路是利用大型语言模型（LLM）的强大能力，将自然语言描述的内核扩展需求自动转换为eBPF程序。通过自然语言编程，降低了开发门槛，提高了开发效率。同时，结合程序理解、符号执行和反馈循环，确保生成的eBPF程序在语义上与用户的自然语言描述一致。

技术框架：KEN的整体框架包含以下几个主要模块：1) 自然语言输入模块：接收用户的自然语言提示。2) LLM程序合成模块：利用LLM将自然语言提示转换为eBPF程序。3) 程序理解模块：使用LLM理解生成的eBPF程序的功能。4) 符号执行模块：对eBPF程序进行符号执行，验证其行为是否符合预期。5) 反馈循环模块：根据符号执行的结果，对LLM进行反馈，迭代优化生成的eBPF程序。

关键创新：KEN的关键创新在于将LLM驱动的程序合成、程序理解和符号执行技术结合起来，形成一个完整的自动化eBPF程序生成框架。特别是在符号执行模块中，KEN采用了一种新颖的结构，能够将程序合成和程序理解的结果结合起来，从而更有效地验证和优化生成的eBPF程序。这种结合充分利用了LLM在程序合成和程序理解方面的优势。

关键设计：KEN的关键设计包括：1) 针对eBPF程序的自然语言提示语料库的设计，用于训练和评估LLM。2) LLM的选择和微调策略，以提高程序合成的准确率。3) 符号执行的约束生成和求解策略，以提高验证效率。4) 反馈循环的迭代优化算法，以提高生成的eBPF程序的质量。具体的参数设置、损失函数、网络结构等技术细节在论文中未详细描述，属于未知信息。

📊 实验亮点

实验结果表明，KEN在80%的情况下能够生成正确的eBPF程序，相比于仅使用LLM进行程序合成的基线方法，性能提升了2.67倍。这一结果验证了KEN框架的有效性，表明通过结合程序理解、符号执行和反馈循环，可以显著提高LLM生成eBPF程序的准确率和可靠性。

🎯 应用场景

KEN具有广泛的应用前景，可以应用于网络安全、性能监控、系统诊断等领域。例如，可以使用KEN快速开发用于检测恶意流量的eBPF程序，或者用于监控系统性能瓶颈的eBPF程序。通过降低eBPF程序开发的门槛，KEN可以促进eBPF技术在更多领域的应用，提高系统的安全性、可靠性和性能。

📄 摘要（原文）

The ability to modify and extend an operating system is an important feature for improving a system's security, reliability, and performance. The extended Berkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism for extending the Linux kernel and has recently been ported to Windows. eBPF programs inject new logic into the kernel that the system will execute before or after existing logic. While the eBPF ecosystem provides a flexible mechanism for kernel extension, it is difficult for developers to write eBPF programs today. An eBPF developer must have deep knowledge of the internals of the operating system to determine where to place logic and cope with programming limitations on the control flow and data accesses of their eBPF program enforced by the eBPF verifier. This paper presents KEN, an alternative framework that alleviates the difficulty of writing an eBPF program by allowing Kernel Extensions to be written in Natural language. KEN uses recent advances in large language models (LLMs) to synthesize an eBPF program given a user's English language prompt. To ensure that LLM's output is semantically equivalent to the user's prompt, KEN employs a combination of LLM-empowered program comprehension, symbolic execution, and a series of feedback loops. KEN's key novelty is the combination of these techniques. In particular, the system uses symbolic execution in a novel structure that allows it to combine the results of program synthesis and program comprehension and build on the recent success that LLMs have shown for each of these tasks individually. To evaluate KEN, we developed a new corpus of natural language prompts for eBPF programs. We show that KEN produces correct eBPF programs on 80% which is an improvement of a factor of 2.67 compared to an LLM-empowered program synthesis baseline.