Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect

作者: Amirtaha Amanzadi, Zahra Dehghanian, Hamid Beigy, Hamid R. Rabiee

分类: cs.CV, cs.AI

发布日期: 2025-10-07

备注: Project code: http://github.com/amir-aman/FusionDetect

🔗 代码/项目: GITHUB

💡 一句话要点

提出FusionDetect，融合CLIP和DINOv2特征，提升伪图像检测的泛化能力。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 伪图像检测 生成对抗网络 视觉泛化 CLIP DINOv2 特征融合 OmniGen基准

📋 核心要点

现有伪图像检测方法主要关注跨生成器泛化，忽略了跨视觉领域的泛化能力。
FusionDetect融合CLIP和DINOv2的互补特征，构建适应内容和设计变化的特征空间。
实验表明，FusionDetect在多个基准测试中均优于现有方法，并对图像扰动具有鲁棒性。

📝 摘要（中文）

生成模型的快速发展使得开发能够可靠检测合成图像的检测器变得至关重要。虽然目前大多数工作都集中在跨生成器的泛化上，但我们认为这种观点过于局限。检测合成图像还涉及另一个同样重要的挑战：跨视觉领域的泛化。为了弥合这一差距，我们提出了OmniGen基准。这个全面的评估数据集包含了12个最先进的生成器，提供了一种在真实条件下评估检测器性能的更现实的方法。此外，我们还介绍了一种新方法FusionDetect，旨在解决这两个泛化向量。FusionDetect利用了两个冻结的基础模型：CLIP和DINOv2的优势。通过从这两个互补模型中提取特征，我们开发了一个有凝聚力的特征空间，该空间自然地适应生成器的内容和设计的变化。我们的大量实验表明，FusionDetect不仅提供了一种新的最先进技术，其准确率比最接近的竞争对手高3.87%，并且在已建立的基准上平均精确度高6.13%，而且在OmniGen上的准确率也提高了4.48%，同时对常见的图像扰动具有出色的鲁棒性。我们不仅推出了一种性能最佳的检测器，还推出了一个新的基准和框架，以促进通用AI图像检测。

🔬 方法详解

问题定义：现有伪图像检测方法在跨生成器泛化方面取得了一定进展，但忽略了跨视觉领域的泛化能力。这意味着当检测器遇到与训练数据视觉风格迥异的伪图像时，性能会显著下降。现有方法难以同时兼顾内容和设计上的变化，导致泛化能力不足。

核心思路：FusionDetect的核心思路是利用预训练的视觉基础模型CLIP和DINOv2的互补优势，提取图像的多样化特征。CLIP擅长捕捉图像的语义信息，而DINOv2则擅长捕捉图像的低级视觉特征。通过融合这两种特征，FusionDetect能够更好地适应生成器的内容和设计变化，从而提高泛化能力。

技术框架：FusionDetect的整体框架包括以下几个主要步骤：1) 使用CLIP和DINOv2分别提取输入图像的特征；2) 将提取的特征进行融合，形成一个统一的特征向量；3) 将融合后的特征向量输入到一个分类器中，判断图像是否为伪图像。CLIP和DINOv2模型是冻结的，只训练分类器。

关键创新：FusionDetect的关键创新在于融合了CLIP和DINOv2的特征。这种融合方式能够有效地结合两种模型的优势，从而提高检测器的泛化能力。此外，OmniGen基准的提出也为评估伪图像检测器的泛化能力提供了一个更全面的平台。

关键设计：FusionDetect使用预训练的CLIP和DINOv2模型，无需进行额外的训练。特征融合的方式采用简单的拼接。分类器采用一个简单的多层感知机（MLP）。损失函数采用交叉熵损失函数。OmniGen基准包含12个最先进的生成器，涵盖了各种不同的视觉领域。

🖼️ 关键图片

📊 实验亮点

FusionDetect在多个基准测试中取得了显著的性能提升。在已建立的基准上，FusionDetect的准确率比最接近的竞争对手高3.87%，平均精确度高6.13%。在OmniGen基准上，FusionDetect的准确率提高了4.48%，并且对常见的图像扰动具有出色的鲁棒性。这些结果表明FusionDetect在伪图像检测方面具有很强的竞争力。

🎯 应用场景

FusionDetect可应用于社交媒体平台、新闻网站等场景，用于检测和过滤虚假图像，防止谣言传播和恶意信息扩散。该研究有助于提高数字内容的真实性和可信度，维护网络安全和信息安全。未来可扩展到视频等其他模态的伪造检测。

📄 摘要（原文）

The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap,we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models,we develop a cohesive feature space that naturally adapts to changes in both thecontent and design of the generator. Our extensive experiments demonstrate that FusionDetect delivers not only a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen,along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection. The code and dataset are available at http://github.com/amir-aman/FusionDetect

Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理