STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

作者: Shaoxiong Guo, Tianyi Du, Lijun Li, Yuyao Wu, Jie Li, Jing Shao

分类: cs.AI

发布日期: 2025-09-30

💡 一句话要点

提出STaR-Attack，针对统一多模态模型生成-理解耦合漏洞的多轮时空叙事推理攻击框架

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 多模态模型 对抗攻击 安全漏洞 叙事推理 生成-理解耦合

📋 核心要点

现有攻击方法通常局限于单模态，依赖语义漂移的提示重写，忽略了统一多模态模型(UMMs)独特的生成-理解耦合漏洞。
STaR-Attack利用三幕叙事理论，生成包含恶意事件的时空叙事图像序列，并通过问答游戏诱导模型暴露安全漏洞。
实验表明，STaR-Attack在Gemini-2.0-Flash上达到了93.06%的攻击成功率，显著优于现有攻击方法。

📝 摘要（中文）

统一多模态理解与生成模型(UMMs)在理解和生成任务中表现出卓越的能力。然而，我们发现UMMs中生成-理解耦合存在漏洞。攻击者可以利用生成功能制作信息丰富的对抗性图像，然后利用理解功能单次吸收，我们称之为跨模态生成注入(CMGI)。目前针对恶意指令的攻击方法通常仅限于单一模态，并且依赖于语义漂移的提示重写，未探索UMMs的独特漏洞。我们提出了STaR-Attack，这是第一个多轮越狱攻击框架，它利用UMMs独特的安全弱点，且没有语义漂移。具体来说，我们的方法定义了一个在时空上下文中与目标查询强相关的恶意事件。利用三幕叙事理论，STaR-Attack生成事件前和事件后的场景，同时将恶意事件隐藏为高潮。在执行攻击策略时，前两轮利用UMM的生成能力来生成这些场景的图像。随后，通过利用理解能力，引入基于图像的问答游戏。STaR-Attack将原始恶意问题嵌入到良性候选项中，迫使模型根据叙事上下文选择并回答最相关的问题。大量实验表明，STaR-Attack始终优于先前的方法，在Gemini-2.0-Flash上实现了高达93.06%的攻击成功率(ASR)，并超过了最强的先前基线FlipAttack。我们的工作揭示了一个关键但未被充分开发的安全漏洞，并强调了UMMs中安全对齐的必要性。

🔬 方法详解

问题定义：论文旨在解决统一多模态模型(UMMs)中存在的安全漏洞问题。现有攻击方法主要集中在单模态输入或依赖于语义漂移的提示工程，未能充分利用UMMs生成和理解能力耦合的特性，导致攻击效果不佳，且容易被防御机制检测到。

核心思路：论文的核心思路是利用UMMs的生成能力，构建一个包含恶意事件的时空叙事场景，然后通过问答游戏诱导模型暴露其安全漏洞。通过将恶意事件隐藏在叙事上下文中，可以有效避免语义漂移，并提高攻击的隐蔽性和成功率。

技术框架：STaR-Attack框架主要包含以下几个阶段： 1. 叙事场景生成：利用三幕叙事理论，生成包含事件前、事件后场景的图像序列，并将恶意事件隐藏在其中。 2. 图像生成：利用UMMs的图像生成能力，将叙事场景转化为具体的图像。 3. 问答游戏：设计一个基于图像的问答游戏，将原始恶意问题嵌入到良性候选项中，迫使模型根据叙事上下文选择并回答问题。 4. 攻击执行：通过多轮交互，逐步引导模型暴露其安全漏洞。

关键创新：STaR-Attack的关键创新在于： 1. 跨模态生成注入(CMGI)：首次揭示并利用了UMMs生成-理解耦合的漏洞。 2. 时空叙事推理攻击：将恶意事件嵌入到时空叙事场景中，提高了攻击的隐蔽性和有效性。 3. 多轮交互攻击框架：通过多轮交互，逐步引导模型暴露其安全漏洞，增强了攻击的鲁棒性。

关键设计： 1. 三幕叙事理论：利用三幕叙事理论构建叙事场景，确保叙事的完整性和连贯性。 2. 恶意事件隐藏：将恶意事件巧妙地隐藏在叙事上下文中，避免直接暴露，降低被检测的风险。 3. 问答游戏设计：精心设计问答游戏，确保问题与叙事场景相关，并能够有效诱导模型暴露其安全漏洞。 4. 候选问题选择：在问答游戏中，将恶意问题与多个良性问题混合，增加模型选择恶意问题的难度，提高攻击的隐蔽性。

🖼️ 关键图片

📊 实验亮点

实验结果表明，STaR-Attack在多个UMMs上均取得了显著的攻击效果，尤其是在Gemini-2.0-Flash上达到了93.06%的攻击成功率(ASR)，大幅超过了现有最强的基线方法FlipAttack。这证明了STaR-Attack能够有效利用UMMs的生成-理解耦合漏洞，实现高成功率的越狱攻击，揭示了UMMs在安全对齐方面仍存在较大挑战。

🎯 应用场景

该研究成果可应用于评估和提升多模态大模型的安全性，特别是在防止恶意信息注入和对抗性攻击方面。通过STaR-Attack，可以更全面地了解UMMs的安全弱点，并为开发更强大的防御机制提供指导。此外，该研究也有助于提高多模态模型在安全敏感领域的应用可靠性，例如智能客服、内容审核等。

📄 摘要（原文）

Unified Multimodal understanding and generation Models (UMMs) have demonstrated remarkable capabilities in both understanding and generation tasks. However, we identify a vulnerability arising from the generation-understanding coupling in UMMs. The attackers can use the generative function to craft an information-rich adversarial image and then leverage the understanding function to absorb it in a single pass, which we call Cross-Modal Generative Injection (CMGI). Current attack methods on malicious instructions are often limited to a single modality while also relying on prompt rewriting with semantic drift, leaving the unique vulnerabilities of UMMs unexplored. We propose STaR-Attack, the first multi-turn jailbreak attack framework that exploits unique safety weaknesses of UMMs without semantic drift. Specifically, our method defines a malicious event that is strongly correlated with the target query within a spatio-temporal context. Using the three-act narrative theory, STaR-Attack generates the pre-event and the post-event scenes while concealing the malicious event as the hidden climax. When executing the attack strategy, the opening two rounds exploit the UMM's generative ability to produce images for these scenes. Subsequently, an image-based question guessing and answering game is introduced by exploiting the understanding capability. STaR-Attack embeds the original malicious question among benign candidates, forcing the model to select and answer the most relevant one given the narrative context. Extensive experiments show that STaR-Attack consistently surpasses prior approaches, achieving up to 93.06% ASR on Gemini-2.0-Flash and surpasses the strongest prior baseline, FlipAttack. Our work uncovers a critical yet underdeveloped vulnerability and highlights the need for safety alignments in UMMs.

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理