CRANE: Knowledge Editing for Reasoning MLLMs

作者: Han Huang, Hao Wang, Mengqi Zhang, Shu Wu, Qiang Liu, Liang Wang

分类: cs.CV, cs.CL

发布日期: 2026-06-08

备注: 10 pages, 5 figures

💡 一句话要点

提出CRANE框架以解决推理多模态大语言模型的知识编辑问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 知识编辑 推理模型 多模态学习 检索增强 认知路由 大语言模型 实验基准

📋 核心要点

现有知识编辑方法在推理多模态大语言模型中存在显著不足，尤其是在推理过程中的失败模式。
CRANE框架通过双库检索系统和认知路由奖励机制，提出了一种无需修改每个编辑参数的知识编辑方法。
在ReasonEdit-Bench基准上，CRANE实现了96.9%的基础成功率，展示了其在多跳链中的有效性和编辑独立性。

📝 摘要（中文）

推理多模态大语言模型（MLLMs）的出现引发了知识编辑的新挑战。传统方法在教师强制准确率上表现良好，但在模型推理过程中可能失败。本文识别了三种失败模式，并提出了一种新的评估协议和ReasonEdit-Bench基准。CRANE框架结合了双库检索系统和两阶段训练策略，能够在不修改每个编辑参数的情况下进行知识编辑。实验结果显示，CRANE在冲突场景中达到了96.9%的基础成功率，展现了其在知识编辑上的有效性。

🔬 方法详解

问题定义：本文旨在解决推理多模态大语言模型中的知识编辑问题，现有方法在模型推理过程中可能导致知识注入失败，表现出结构崩溃、认知失调和浅层内化等问题。

核心思路：CRANE框架的核心思想是通过检索增强的方法，结合双库检索系统和认知路由奖励机制，避免对每个编辑参数的修改，从而提高知识编辑的有效性。

技术框架：CRANE的整体架构包括两个主要阶段：首先进行结构初始化的监督微调（SFT），然后通过认知路由奖励（GRPO）训练模型在视觉先验和注入事实之间进行裁决。

关键创新：CRANE的主要创新在于其检索增强的设计，能够在不修改模型参数的情况下实现知识编辑，克服了传统方法的局限性。

关键设计：CRANE采用了双库检索系统，利用多模态信息进行知识检索，同时设计了认知路由奖励机制，以引导模型在推理过程中有效整合视觉和文本信息。

🖼️ 关键图片

📊 实验亮点

CRANE在ReasonEdit-Bench基准上取得了96.9%的基础成功率和96.9%的中间实体使用率，展现了其在多跳链中的强大能力。此外，在MMEVOKE基准上，CRANE在黄金检索下达到了87.0%的表现，显示出其在知识编辑任务中的优越性。

🎯 应用场景

CRANE框架在推理多模态大语言模型的知识编辑中具有广泛的应用潜力，能够提升模型在复杂推理任务中的表现，适用于智能问答、图像理解等领域。未来，该方法可能推动更高效的知识编辑技术的发展，促进多模态AI系统的智能化。

📄 摘要（原文）

The emergence of reasoning multimodal large language models (MLLMs), which generate explicit chain-of-thought (CoT) reasoning before producing answers, has introduced a new challenge for knowledge editing: methods that appear successful under traditional metrics (teacher-forcing accuracy up to 100%) can fail severely when the model's reasoning process is examined (Grounded Success as low as 0%). We identify three failure modes: (1) Structural Collapse, where weight-modifying methods destroy the CoT format; (2) Cognitive Dissonance, where the model's reasoning chain actively rejects the injected edit fact based on visual evidence; and (3) Shallow Internalization, where methods succeed on exact queries but fail on rephrase or multi-hop variants. On reasoning MLLMs, these modes interact: methods that generalize (FT, LoRA) trigger format collapse, while methods without deep modification cannot generalize. To expose these failures, we propose a CoT-aware evaluation protocol and construct ReasonEdit-Bench, with conflict stratification, multi-level probes, and multi-hop portability tests. We propose CRANE, a retrieval-augmented framework that requires no per-edit parameter modification. CRANE combines a modality-aware dual-library retrieval system with a two-phase training strategy: Supervised Fine-Tuning (SFT) for structural initialization, followed by GRPO with a Cognitive Routing Reward that trains the model to arbitrate between visual priors and injected edit facts. On ReasonEdit-Bench, CRANE achieves 96.9% Grounded Success on conflict scenarios and 96.9% intermediate entity usage in multi-hop chains, with 97.6% text-locality and 68.1% image-locality Edit Independence. On the out-of-distribution MMEVOKE benchmark, CRANE reaches 87.0% under gold retrieval.

CRANE: Knowledge Editing for Reasoning MLLMs

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理