DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

作者: Hanqing Yang, Qiang Zhou, Yongchao Du, Sashuai Zhou, Zhibin Wang, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng

分类: cs.CV, cs.AI

发布日期: 2026-04-28

💡 一句话要点

DDA-Thinker：解耦双原子强化学习，用于推理驱动的图像编辑

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture)

关键词: 图像编辑 强化学习 推理驱动 解耦框架 双原子奖励

📋 核心要点

现有图像编辑模型在视觉保真度方面表现出色，但在需要复杂推理的任务中表现不佳，面临挑战。
DDA-Thinker框架通过解耦规划模块（Thinker）和生成模块（Editor），并使用双原子强化学习来优化Thinker的推理能力。
在推理驱动的图像编辑基准测试中，DDA-Thinker显著提高了性能，使社区模型能够与强大的专有模型竞争。

📝 摘要（中文）

本文提出DDA-Thinker，一个以Thinker为中心的框架，旨在提升图像编辑模型在复杂推理任务中的性能。该框架将规划模块（Thinker）与固定的生成模型（Editor）解耦，从而能够独立优化Thinker并评估其贡献。为了有效指导Thinker，引入了双原子强化学习框架，将反馈分解为认知原子奖励（评估Thinker可执行计划的质量）和视觉原子奖励（评估最终图像质量）。通过结合源图像、用户指令以及理想编辑后场景的理性参考描述，提升检查列表的质量。此外，还开发了一个两阶段数据管理流程，首先合成多样且侧重推理的数据集，然后应用难度感知细化来管理强化学习的有效训练课程。在RISE-Bench和KRIS-Bench等推理驱动的图像编辑基准测试中，实验结果表明该方法显著提高了整体性能，使社区模型能够达到与强大的专有模型相媲美的结果。

🔬 方法详解

问题定义：现有图像编辑模型在视觉效果上表现良好，但在需要复杂推理的任务中表现不足。例如，根据指令修改图像中的多个对象，并保证它们之间的关系符合逻辑。现有方法难以有效进行推理和规划，导致编辑结果不理想。

核心思路：将图像编辑过程分解为两个独立的部分：推理规划（Thinker）和图像生成（Editor）。通过专注于优化推理规划模块，可以更有效地提升模型在复杂编辑任务中的表现。使用强化学习来训练Thinker，使其能够根据环境（图像和指令）生成合理的编辑计划。

技术框架：DDA-Thinker框架包含两个主要模块：Thinker和Editor。Editor是一个预训练的图像生成模型，负责根据Thinker生成的计划进行图像编辑。Thinker是一个强化学习智能体，负责接收用户指令和原始图像，并输出一系列编辑操作。通过双原子强化学习框架，Thinker接收认知原子奖励（评估计划质量）和视觉原子奖励（评估最终图像质量）的反馈，从而不断优化其推理和规划能力。

关键创新：双原子强化学习框架是关键创新点。它将传统的单一奖励信号分解为两个独立的原子奖励，分别针对计划的合理性和最终图像的质量。这种分解使得强化学习过程更加稳定和有效，因为Thinker可以更清晰地了解其行为对不同方面的影响。此外，基于理性参考描述的检查列表合成方法也提高了奖励信号的质量。

关键设计：认知原子奖励基于可验证的检查列表，这些列表根据源图像、用户指令和理想编辑后场景的理性参考描述生成。视觉原子奖励则基于图像质量评估指标。强化学习算法使用PPO（Proximal Policy Optimization）。数据管理流程包括两个阶段：首先合成一个多样化的数据集，然后使用难度感知细化来选择用于训练的样本。

🖼️ 关键图片

📊 实验亮点

实验结果表明，DDA-Thinker在RISE-Bench和KRIS-Bench等基准测试中显著提高了性能。例如，在KRIS-Bench上，DDA-Thinker使社区模型能够达到与强大的专有模型相媲美的结果，证明了该方法在提升推理驱动的图像编辑能力方面的有效性。具体性能提升数据在论文中有详细展示。

🎯 应用场景

该研究成果可应用于各种需要推理驱动的图像编辑场景，例如：智能图像修复、创意图像合成、以及基于指令的图像操作。其潜在价值在于提升图像编辑的智能化水平，降低用户的使用门槛，并为创意设计提供更强大的工具。未来，该技术有望应用于虚拟现实、增强现实等领域，实现更自然、更智能的人机交互。

📄 摘要（原文）

Recent image editing models have achieved strong visual fidelity but often struggle with tasks requiring complex reasoning. To investigate and enhance the reasoning-grounded planning for image editing, we propose DDA-Thinker, a Thinker-centric framework designed for the independent optimization of a planning module (Thinker) over a fixed generative model (Editor). This decoupled Thinker-centric paradigm facilitates a controlled analysis of the planning module and makes its contribution under a fixed Editor easier to assess. To effectively guide this Thinker, we introduce a dual-atomic reinforcement learning framework. This framework decomposes feedback into two distinct atomic rewards implemented through verifiable checklists: a cognitive-atomic reward to directly assess the quality of the Thinker's executable plan, which serves as the actionable outcome of the Thinker's reasoning, and a visual-atomic reward to assess the final image quality. To improve checklist quality, our checklist synthesis is grounded not only in the source image and user instruction but also in a rational reference description of the ideal post-edit scene. To support this training, we further develop a two-stage data curation pipeline that first synthesizes a diverse and reasoning-focused dataset, then applies difficulty-aware refinement to curate an effective training curriculum for reinforcement learning. Extensive experiments on reasoning-driven image editing benchmarks, including RISE-Bench and KRIS-Bench, demonstrate that our approach substantially improves overall performance. Our method enables a community model to achieve results competitive with strong proprietary models, highlighting the practical potential of Thinker-centric optimization under a fixed-editor setting.

DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理