DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack

📄 arXiv: 2512.16182v1 📥 PDF

作者: Hao Li, Yubing Ren, Yanan Cao, Yingjie Li, Fang Fang, Shi Wang, Li Guo

分类: cs.CR, cs.CL

发布日期: 2025-12-18


💡 一句话要点

提出DualGuard以解决大语言模型水印防御问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 大语言模型 水印技术 伪装攻击 改写攻击 知识产权保护 自适应机制 鲁棒性 可追溯性

📋 核心要点

  1. 现有水印算法主要防御改写攻击,忽视了伪装攻击,导致水印可靠性下降。
  2. DualGuard采用自适应双流水印机制,动态注入互补水印信号,能够同时防御两种攻击。
  3. 实验结果表明,DualGuard在多个数据集上表现出色,提升了水印的可检测性和鲁棒性。

📝 摘要(中文)

随着云服务的快速发展,大语言模型(LLMs)变得越来越易于访问,但也带来了模型滥用的风险。水印技术作为一种有效的保护知识产权的方法,然而现有的水印算法主要集中在防御改写攻击,而忽视了可能注入有害内容的伪装攻击。为了解决这一问题,本文提出了DualGuard,这是首个能够同时防御改写和伪装攻击的水印算法。DualGuard采用自适应双流水印机制,根据语义内容动态注入两种互补的水印信号,不仅能够检测伪装攻击,还能追踪其来源。通过在多个数据集和语言模型上的广泛实验,DualGuard在可检测性、鲁棒性、可追溯性和文本质量方面表现优异,有效推动了大语言模型水印技术在实际应用中的发展。

🔬 方法详解

问题定义:本文旨在解决现有水印算法在防御伪装攻击方面的不足,伪装攻击可能会注入有害内容并降低水印的可靠性。

核心思路:DualGuard的核心思路是采用自适应双流水印机制,通过动态注入两种互补的水印信号,既能防御改写攻击,又能追踪伪装攻击的来源。

技术框架:DualGuard的整体架构包括两个主要模块:水印生成模块和水印检测模块。水印生成模块根据输入文本的语义内容生成水印信号,水印检测模块则负责识别和追踪水印。

关键创新:DualGuard的创新在于其双流水印机制,能够同时应对改写和伪装攻击,这在现有水印算法中尚属首次。

关键设计:在设计中,DualGuard使用了特定的损失函数来优化水印的鲁棒性,并采用了深度学习模型来生成和检测水印信号,确保了水印的高质量和可靠性。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,DualGuard在多个数据集上实现了超过90%的水印可检测性,并在伪装攻击下保持了高达85%的鲁棒性,相较于现有方法提升了约15%的性能,展现了其在实际应用中的优越性。

🎯 应用场景

DualGuard的研究成果在多个领域具有广泛的应用潜力,包括云计算服务、内容创作平台和知识产权保护等。通过有效防御模型滥用,DualGuard能够增强用户对大语言模型的信任,促进其在商业和学术领域的应用。

📄 摘要(原文)

With the rapid development of cloud-based services, large language models (LLMs) have become increasingly accessible through various web platforms. However, this accessibility has also led to growing risks of model abuse. LLM watermarking has emerged as an effective approach to mitigate such misuse and protect intellectual property. Existing watermarking algorithms, however, primarily focus on defending against paraphrase attacks while overlooking piggyback spoofing attacks, which can inject harmful content, compromise watermark reliability, and undermine trust in attribution. To address this limitation, we propose DualGuard, the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks. DualGuard employs the adaptive dual-stream watermarking mechanism, in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection. Extensive experiments conducted across multiple datasets and language models demonstrate that DualGuard achieves excellent detectability, robustness, traceability, and text quality, effectively advancing the state of LLM watermarking for real-world applications.