AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software

作者: Bin Wang, Wenjie Yu, Yilu Zhong, Hao Yu, Keke Lian, Chaohua Lu, Hongfang Zheng, Dong Zhang, Hui Li

分类: cs.SE, cs.AI

发布日期: 2025-12-21

备注: https://mp.weixin.qq.com/s/sI_LKPnA-BeCVYr9Ko4sqg https://github.com/Narwhal-Lab/aicode-in-the-wild-security-risk-report

💡 一句话要点

首个大规模实证研究揭示AI生成代码在软件生态中的安全风险与演变趋势

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: AI生成代码 软件安全 实证研究 代码漏洞 人机协作

📋 核心要点

大型语言模型（LLMs）在代码生成中日益普及，但其在实际软件开发中的应用范围和安全影响尚不明确。
构建高精度检测流程和代表性基准，区分AI生成代码与人工代码，并分析其在GitHub项目和CVE漏洞中的分布。
研究揭示AI代码在软件生态中的采用模式、安全风险以及人机协作中的角色，为未来研究提供数据基础。

📝 摘要（中文）

本文首次对现实世界中AI生成代码（AIGCode）进行了大规模实证研究。构建了一个高精度检测流程和一个具有代表性的基准，用于区分AIGCode和人工编写的代码，并将其应用于（i）来自前1000个GitHub存储库（2022-2025）的开发提交和（ii）7000多个最近的CVE相关代码更改。这使得我们能够沿着人/AI轴标记提交、文件和函数，并追踪AIGCode如何在项目中和漏洞生命周期中移动。测量结果显示了三种生态模式。首先，AIGCode已经占新代码的很大一部分，但采用是有结构的：AI集中在胶水代码、测试、重构、文档和其他样板代码中，而核心逻辑和安全关键配置仍然主要由人工编写。其次，采用具有安全后果：某些CWE家族在AI标记的代码中过度表示，并且几乎相同的非安全模板在不相关的项目中重复出现，这表明“AI诱导的漏洞”由共享模型传播，而不是由共享维护者传播。第三，在人-AI编辑链中，AI引入高吞吐量更改，而人类充当安全守门员；当审查不充分时，AI引入的缺陷会持续更长时间，暴露在网络可访问的表面上，并传播到更多文件和存储库。我们将开源完整的数据集，并发布分析工件以及方法和发现的详细文档。

🔬 方法详解

问题定义：论文旨在解决的问题是：在现代软件开发中，AI生成代码（AIGCode）的实际使用情况、安全风险以及对软件生态系统的影响。现有方法缺乏大规模的实证研究，无法准确评估AIGCode的普及程度、潜在漏洞以及人机协作模式。

核心思路：论文的核心思路是通过构建高精度的AIGCode检测流水线，并将其应用于大规模的真实代码库（GitHub项目和CVE漏洞），从而量化AIGCode的使用情况，识别潜在的安全风险，并分析人机协作模式。通过这种方式，可以更全面地了解AIGCode对软件开发的影响。

技术框架：整体框架包括以下几个主要阶段：1) 构建AIGCode检测流水线：设计算法区分AIGCode和人工编写的代码。2) 构建代表性基准：用于评估检测流水线的性能。3) 数据收集与分析：从GitHub和CVE数据库收集代码变更数据，并使用检测流水线标记AIGCode。4) 生态模式分析：分析AIGCode在项目中的分布、漏洞生命周期中的角色以及人机协作模式。

关键创新：论文的关键创新在于：1) 首次对AIGCode在现实世界中的使用情况进行大规模实证研究。2) 构建了高精度的AIGCode检测流水线，能够有效区分AIGCode和人工编写的代码。3) 揭示了AIGCode在软件生态中的采用模式、安全风险以及人机协作中的角色。

关键设计：论文中关于AIGCode检测流水线的具体算法细节、基准的构建方法、以及数据分析的具体指标等关键设计细节，摘要中未详细说明，属于未知信息。

🖼️ 关键图片

📊 实验亮点

研究表明，AIGCode已占据相当比例的新代码，尤其集中在胶水代码、测试和文档等领域。同时，AI代码中某些CWE漏洞类型过度表示，且存在跨项目传播的“AI诱导漏洞”。人机协作中，AI引入大量变更，而人工审查不足可能导致缺陷长期存在并扩散。

🎯 应用场景

该研究成果可应用于软件安全分析、代码质量评估、开发者工具改进等领域。通过识别和控制AI生成代码中的安全风险，可以提高软件系统的整体安全性。此外，该研究还可以帮助开发者更好地理解AI在软件开发中的角色，并优化人机协作模式。

📄 摘要（原文）

Large language models (LLMs) for code generation are becoming integral to modern software development, but their real-world prevalence and security impact remain poorly understood. We present the first large-scale empirical study of AI-generated code (AIGCode) in the wild. We build a high-precision detection pipeline and a representative benchmark to distinguish AIGCode from human-written code, and apply them to (i) development commits from the top 1,000 GitHub repositories (2022-2025) and (ii) 7,000+ recent CVE-linked code changes. This lets us label commits, files, and functions along a human/AI axis and trace how AIGCode moves through projects and vulnerability life cycles. Our measurements show three ecological patterns. First, AIGCode is already a substantial fraction of new code, but adoption is structured: AI concentrates in glue code, tests, refactoring, documentation, and other boilerplate, while core logic and security-critical configurations remain mostly human-written. Second, adoption has security consequences: some CWE families are overrepresented in AI-tagged code, and near-identical insecure templates recur across unrelated projects, suggesting "AI-induced vulnerabilities" propagated by shared models rather than shared maintainers. Third, in human-AI edit chains, AI introduces high-throughput changes while humans act as security gatekeepers; when review is shallow, AI-introduced defects persist longer, remain exposed on network-accessible surfaces, and spread to more files and repositories. We will open-source the complete dataset and release analysis artifacts and fine-grained documentation of our methodology and findings.

AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理