VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation

📄 arXiv: 2508.18933v1 📥 PDF

作者: David Egea, Barproda Halder, Sanghamitra Dutta

分类: cs.AI, cs.CR, cs.CY, cs.LG

发布日期: 2025-08-26


💡 一句话要点

提出VISION框架以解决代码漏洞检测中的虚假相关性问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 代码漏洞检测 图神经网络 反事实增强 网络安全 可解释性AI 机器学习 自动化审查

📋 核心要点

  1. 现有的图神经网络在代码漏洞检测中受到训练数据不平衡和标签噪声的影响,导致模型学习到虚假相关性,无法有效泛化到真实场景。
  2. 本文提出的VISION框架通过生成反事实样本来增强训练数据,结合图神经网络进行针对性训练,从而提高漏洞检测的鲁棒性和可解释性。
  3. 实验结果显示,VISION在CWE-20漏洞检测任务中,整体准确率从51.8%提升至97.8%,对比准确率从4.5%提升至95.8%,表现出显著的性能提升。

📝 摘要(中文)

自动检测源代码中的漏洞是网络安全领域的重要挑战,直接影响数字系统和服务的信任度。图神经网络(GNN)因其能够学习代码的结构和逻辑关系而受到关注,但其性能受到训练数据不平衡和标签噪声的严重制约。本文提出了一种名为VISION的统一框架,通过系统性增强反事实训练数据集来减轻虚假相关性。该框架包括生成反事实样本、针对成对代码示例的GNN训练以及基于图的可解释性分析。实验结果表明,VISION显著提高了漏洞检测的准确性和可解释性,推动了基于AI的网络安全系统的透明性和可信度。

🔬 方法详解

问题定义:本文旨在解决现有代码漏洞检测方法中由于训练数据不平衡和标签噪声导致的虚假相关性问题。这些问题使得模型在面对未见过的真实数据时,泛化能力不足。

核心思路:VISION框架的核心思路是通过生成反事实样本来增强训练数据,反事实样本是对原始样本进行最小语义修改但标签相反的样本,从而帮助模型学习更为稳健的特征。

技术框架:该框架包括三个主要模块:首先,利用大型语言模型生成反事实样本;其次,针对成对的代码示例进行图神经网络的训练;最后,通过图结构的可解释性分析,识别与漏洞预测相关的重要代码语句。

关键创新:最重要的技术创新在于引入反事实样本生成机制,系统性地减少了模型学习到的虚假相关性。这一方法与传统的训练方式有本质区别,后者往往依赖于原始数据的直接特征。

关键设计:在模型设计中,采用了特定的损失函数来平衡正负样本的影响,并通过图神经网络的结构设计来增强模型对重要特征的关注。

📊 实验亮点

实验结果表明,VISION在CWE-20漏洞检测任务中的整体准确率从51.8%提升至97.8%,对比准确率从4.5%提升至95.8%,最差组准确率从0.7%提升至85.5%。这些显著的提升展示了该框架在减少虚假学习和增强检测鲁棒性方面的有效性。

🎯 应用场景

该研究的潜在应用领域包括软件安全性分析、自动化代码审查和漏洞检测工具的开发。通过提高漏洞检测的准确性和可解释性,VISION能够帮助开发者更有效地识别和修复代码中的安全漏洞,进而提升软件系统的整体安全性。未来,随着AI技术的不断发展,该框架有望在更广泛的网络安全领域中发挥重要作用。

📄 摘要(原文)

Automated detection of vulnerabilities in source code is an essential cybersecurity challenge, underpinning trust in digital systems and services. Graph Neural Networks (GNNs) have emerged as a promising approach as they can learn structural and logical code relationships in a data-driven manner. However, their performance is severely constrained by training data imbalances and label noise. GNNs often learn 'spurious' correlations from superficial code similarities, producing detectors that fail to generalize well to unseen real-world data. In this work, we propose a unified framework for robust and interpretable vulnerability detection, called VISION, to mitigate spurious correlations by systematically augmenting a counterfactual training dataset. Counterfactuals are samples with minimal semantic modifications but opposite labels. Our framework includes: (i) generating counterfactuals by prompting a Large Language Model (LLM); (ii) targeted GNN training on paired code examples with opposite labels; and (iii) graph-based interpretability to identify the crucial code statements relevant for vulnerability predictions while ignoring spurious ones. We find that VISION reduces spurious learning and enables more robust, generalizable detection, improving overall accuracy (from 51.8% to 97.8%), pairwise contrast accuracy (from 4.5% to 95.8%), and worst-group accuracy (from 0.7% to 85.5%) on the Common Weakness Enumeration (CWE)-20 vulnerability. We further demonstrate gains using proposed metrics: intra-class attribution variance, inter-class attribution distance, and node score dependency. We also release CWE-20-CFA, a benchmark of 27,556 functions (real and counterfactual) from the high-impact CWE-20 category. Finally, VISION advances transparent and trustworthy AI-based cybersecurity systems through interactive visualization for human-in-the-loop analysis.