Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy

作者: Marcel Roth, Micha V. Nowak, Adrian Krenzer, Frank Puppe

分类: cs.CV, cs.LG

发布日期: 2024-10-21 (更新: 2024-12-11)

💡 一句话要点

提出领域自适应预训练方法，提升胃肠内窥镜医学图像分类性能

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 胃肠内窥镜 医学图像分类 领域自适应 自监督学习 预训练模型 EVA-02 EndoExtend24

📋 核心要点

现有胃肠内窥镜图像分析面临数据量巨大、图像多样性高、标注数据稀缺等挑战。
提出领域自适应预训练方法，利用自监督学习和大规模数据集，将通用模型适配到特定医学图像领域。
通过在EndoExtend24数据集上预训练EVA-02模型，并在Capsule Endoscopy 2024 Challenge中取得第三名，验证了方法的有效性。

📝 摘要（中文）

视频胶囊内窥镜通过非侵入性方式获取胃肠道详细图像，实现疾病早期检测，革新了胃肠内窥镜诊断。然而，成像过程中产生的大量图像（通常6-8小时内高达100万张）限制了其潜力，需要自动化分析。此外，图像的多样性、专家标注的需求以及高质量标注数据集的稀缺性也制约了现有医学图像分析模型的有效性。为解决这些问题，我们引入了一个新的大型胃肠内窥镜数据集EndoExtend24，它通过合并十个现有的公共和私有数据集创建，确保了分割中的患者完整性。EndoExtend24包含超过226,000张带标签的图像，以及动态类映射，允许跨具有不同标签粒度的数据集进行统一训练，支持多达123种不同的病理发现。此外，我们建议利用在通用图像数据上通过自监督训练的基础模型的领域自适应预训练，使其适应胃肠内窥镜医学图像诊断任务。具体来说，基于ViT架构并在ImageNet-22k上使用掩码图像建模（使用EVA-CLIP作为MIM教师）训练的EVA-02模型，在EndoExtend24数据集上进行预训练以实现领域适应，最后在Capsule Endoscopy 2024 Challenge数据集上进行训练。我们的模型表现出强大的性能，在Capsule Endoscopy 2024 Challenge中获得第三名。我们在测试集上实现了0.762的宏AUC和37.1%的平衡准确率。这些结果强调了我们的领域自适应预训练方法和丰富的EndoExtend24数据集在推进胃肠内窥镜诊断方面的有效性。

🔬 方法详解

问题定义：论文旨在解决胃肠内窥镜图像分类任务中，由于数据量大、图像差异性高、标注数据不足等问题导致的模型性能瓶颈。现有方法难以有效利用有限的标注数据，且泛化能力较弱。

核心思路：论文的核心思路是利用大规模的自监督预训练和领域自适应方法，将一个在通用图像数据集上训练好的基础模型（EVA-02）迁移到胃肠内窥镜图像领域。通过在领域相关的大规模无标注数据集（EndoExtend24）上进行预训练，使模型学习到该领域的特定特征表示，从而提升在下游分类任务上的性能。

技术框架：整体框架包含以下几个阶段：1) 数据集构建：合并多个现有数据集，构建大规模的胃肠内窥镜图像数据集EndoExtend24。2) 领域自适应预训练：使用EVA-02模型在EndoExtend24数据集上进行自监督预训练，采用掩码图像建模（MIM）的方式，利用EVA-CLIP作为教师模型。3) 下游任务微调：在Capsule Endoscopy 2024 Challenge数据集上对预训练模型进行微调，完成图像分类任务。

关键创新：论文的关键创新在于：1) 构建了大规模的胃肠内窥镜图像数据集EndoExtend24，为领域自适应预训练提供了数据基础。2) 提出了基于自监督学习的领域自适应预训练方法，有效利用了无标注数据，提升了模型在特定医学图像领域的泛化能力。3) 采用了动态类映射，允许跨具有不同标签粒度的数据集进行统一训练。

关键设计：论文的关键设计包括：1) 使用EVA-02模型作为基础模型，该模型基于ViT架构，具有强大的特征提取能力。2) 采用掩码图像建模（MIM）作为自监督学习方法，利用EVA-CLIP作为教师模型，提升了预训练效果。3) 构建EndoExtend24数据集时，考虑了患者隐私保护，并进行了数据清洗和标注规范化。

🖼️ 关键图片

📊 实验亮点

该模型在Capsule Endoscopy 2024 Challenge中获得第三名，在测试集上实现了0.762的宏AUC和37.1%的平衡准确率。实验结果表明，领域自适应预训练方法和EndoExtend24数据集能够有效提升胃肠内窥镜图像分类性能，验证了所提出方法的有效性。

🎯 应用场景

该研究成果可应用于胃肠内窥镜图像的自动分析，辅助医生进行疾病诊断，提高诊断效率和准确性。通过自动化分析大量内窥镜图像，可以减少医生的工作负担，并降低漏诊率。未来，该方法有望推广到其他医学图像领域，促进人工智能在医疗领域的应用。

📄 摘要（原文）

Video capsule endoscopy has transformed gastrointestinal endoscopy (GIE) diagnostics by offering a non-invasive method for capturing detailed images of the gastrointestinal tract, enabling early disease detection. However, its potential is limited by the sheer volume of images generated during the imaging procedure, which can take anywhere from 6-8 hours and often produce up to 1 million images, necessitating automated analysis. Additionally, the variability of these images, combined with the need for expert annotations and the scarcity of large, high-quality labeled datasets, constrains the effectiveness of current medical image analysis models. To address this, we introduce a novel large GIE dataset, called EndoExtend24, created by merging ten existing public and private datasets, ensuring patient integrity across splits. EndoExtend24 includes over 226,000 labeled images, as well as dynamic class mappings, which allow unified training across datasets with differing labeling granularity, supporting up to 123 distinct pathological findings. Further, we propose to leverage domain adaptive pre-training of foundation models trained with self-supervision on generic image data, to adapt them to the task of GIE medical image diagnosis. Specifically, the EVA-02 model, which is based on the ViT architecture and trained on ImageNet-22k with masked image modeling (using EVA-CLIP as a MIM teacher), is pre-trained on the EndoExtend24 dataset to achieve domain adaptation, and finally trained on the Capsule Endoscopy 2024 Challenge dataset. Our model demonstrates robust performance, securing third place in the Capsule Endoscopy 2024 Challenge. We achieved a macro AUC of 0.762 and a balanced accuracy of 37.1% on the test set. These results emphasize the effectiveness of our domain-adaptive pre-training approach and the enriched EndoExtend24 dataset in advancing gastrointestinal endoscopy diagnostics.

Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理