Counterfactual Token Generation in Large Language Models

作者: Ivi Chatzi, Nina Corvelo Benz, Eleni Straitouri, Stratis Tsirtsis, Manuel Gomez-Rodriguez

分类: cs.LG, cs.AI, cs.CL

发布日期: 2024-09-25 (更新: 2025-03-24)

备注: Accepted at CLeaR 2025

💡 一句话要点

提出基于Gumbel-Max SCM的因果token生成方法，增强LLM的反事实推理能力。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 反事实推理 因果模型 Gumbel-Max SCM token生成 偏差检测

📋 核心要点

现有大型语言模型缺乏反事实推理能力，无法对已生成token的替代方案进行有效推演。
论文提出基于Gumbel-Max结构因果模型的token生成方法，实现反事实token生成，无需额外训练。
实验表明，该方法在Llama 3和Mistral上有效，并可用于偏差检测，揭示模型的世界观。

📝 摘要（中文）

本文旨在增强大型语言模型（LLM）的反事实推理能力。当前LLM是无状态的，无法对已生成token的反事实替代方案进行推理。为此，作者提出了一个基于Gumbel-Max结构因果模型（SCM）的token生成因果模型。该模型允许LLM执行反事实token生成，与原始token生成相比，几乎没有额外成本，实现简单，无需微调或提示工程。作者在Llama 3 8B-Instruct和Ministral-8B-Instruct上实现了该模型，并对反事实生成的文本进行了定性和定量分析。最后，作者展示了反事实token生成在偏差检测中的应用，揭示了LLM构建的世界模型的有趣见解。

🔬 方法详解

问题定义：大型语言模型在生成文本时是自回归的，缺乏对已生成token进行反事实推理的能力。例如，如果模型生成了“Captain Lyra”，无法推断如果生成“Captain Maeve”故事会如何发展。现有的LLM是无状态的，无法进行此类推理。

核心思路：论文的核心思路是利用结构因果模型（SCM）来建模token生成过程，特别是使用Gumbel-Max SCM。通过这种方式，可以显式地模拟token生成过程中的因果关系，并允许对反事实场景进行干预和推理。核心在于将token的选择视为一个因果过程，并使用SCM来模拟这个过程。

技术框架：整体框架基于Gumbel-Max SCM，将LLM的token生成过程建模为一个因果图。具体来说，对于每个token的生成，模型会计算一个logits向量，然后通过Gumbel-Max机制选择一个token。反事实推理通过干预这个选择过程来实现，即改变Gumbel噪声或logits向量，观察生成文本的变化。整个过程不需要修改LLM的参数，只需要在推理阶段进行干预。

关键创新：最重要的创新点在于将结构因果模型应用于LLM的token生成过程，从而实现了反事实token生成。与现有方法相比，该方法不需要对LLM进行微调，也不需要复杂的提示工程，实现简单且高效。此外，该方法提供了一种新的视角来理解LLM的生成过程，并为偏差检测等应用提供了新的工具。

关键设计：关键设计在于Gumbel-Max机制的应用。Gumbel-Max机制允许将token的选择建模为一个随机过程，并可以通过改变Gumbel噪声或logits向量来模拟反事实场景。具体来说，对于每个token的生成，模型会计算一个logits向量，然后加上一个Gumbel噪声向量，选择具有最大值的token。反事实推理通过改变Gumbel噪声或logits向量来实现，例如，可以强制选择一个不同的token，然后观察生成文本的变化。

🖼️ 关键图片

📊 实验亮点

论文在Llama 3 8B-Instruct和Ministral-8B-Instruct上进行了实验，展示了该方法在反事实token生成方面的有效性。通过定性和定量分析，验证了该方法可以生成有意义且一致的反事实文本。此外，论文还展示了该方法在偏差检测中的应用，揭示了LLM构建的世界模型的有趣见解。

🎯 应用场景

该研究成果可应用于多个领域，例如偏差检测、公平性评估、可解释性分析等。通过反事实token生成，可以评估LLM在不同情境下的行为，发现潜在的偏见和不公平性。此外，该方法还可以用于生成更具创造性和多样性的文本，并提高LLM的鲁棒性。

📄 摘要（原文）

"Sure, I am happy to generate a story for you: Captain Lyra stood at the helm of her trusty ship, the Maelstrom's Fury, gazing out at the endless sea. [...] Lyra's eyes welled up with tears as she realized the bitter truth - she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself." Although this story, generated by a large language model, is captivating, one may wonder -- how would the story have unfolded if the model had chosen "Captain Maeve" as the protagonist instead? We cannot know. State-of-the-art large language models are stateless -- they maintain no internal memory or state. Given a prompt, they generate a sequence of tokens as an output using an autoregressive process. As a consequence, they cannot reason about counterfactual alternatives to tokens they have generated in the past. In this work, our goal is to enhance them with this functionality. To this end, we develop a causal model of token generation that builds upon the Gumbel-Max structural causal model. Our model allows any large language model to perform counterfactual token generation at almost no cost in comparison with vanilla token generation, it is embarrassingly simple to implement, and it does not require any fine-tuning nor prompt engineering. We implement our model on Llama 3 8B-Instruct and Ministral-8B-Instruct and conduct a qualitative and a quantitative analysis of counterfactually generated text. We conclude with a demonstrative application of counterfactual token generation for bias detection, unveiling interesting insights about the model of the world constructed by large language models.

Counterfactual Token Generation in Large Language Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理