Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness

📄 arXiv: 2605.07055v1 📥 PDF

作者: Qiangqiang Wu, Grace McIlvain, Zhou Yu, Junhao Wen

分类: cs.CV, cs.AI

发布日期: 2026-05-08


💡 一句话要点

提出Pan-FM:一种基于显著性引导掩码的泛器官基础模型,以解决多模态医学影像中的缺失数据鲁棒性问题。

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 医学基础模型 多模态学习 显著性引导掩码 自监督学习 鲁棒性学习 全身表征学习

📋 核心要点

  1. 现有医学基础模型多局限于单一器官,且难以处理现实中普遍存在的非随机数据缺失问题,导致模型泛化性差。
  2. 提出Pan-FM模型,采用统一骨干网络处理多器官输入,并引入显著性引导掩码(SGM)机制,动态抑制主导器官的特征贡献。
  3. 在UK Biobank数据集上,Pan-FM在13类疾病预测任务中表现优异,并在模态缺失场景下展现出显著的鲁棒性提升。

📝 摘要(中文)

基础模型(FMs)在医学影像领域展现出巨大潜力,但现有模型多局限于单一模态或特定器官。鉴于人类衰老与疾病涉及全身多器官的协同过程,开发多模态全身基础模型至关重要。然而,现实中的生物医学数据常存在非随机缺失,导致模型泛化能力受限并产生偏差。为此,本文提出Pan-FM,这是一个在七种器官(脑、心、脂肪、肝、肾、脾、胰腺)上预训练的泛器官基础模型。针对多模态预训练中常见的“主导器官捷径学习”偏差,我们引入了显著性引导掩码(SGM)策略,通过自适应掩盖主导器官,促进模型进行更均衡的全身表征学习。实验表明,Pan-FM在UK Biobank数据集上显著优于单器官及多器官基线,在处理缺失模态时表现出更强的鲁棒性与预测性能。

🔬 方法详解

问题定义:现有医学多模态学习面临“非随机缺失”挑战,且模型倾向于过度依赖脂肪或心脏等易于识别的“主导器官”,导致对其他器官的表征学习不足,产生严重的捷径学习偏差。

核心思路:通过引入显著性引导掩码(SGM),利用模型自身的注意力分布实时识别并掩盖主导器官,强制模型关注被忽略的器官特征,从而实现更均衡的全身表征学习。

技术框架:Pan-FM采用统一的Transformer骨干网络,支持多器官输入。预训练阶段采用基于掩码的自蒸馏框架,通过SGM模块动态调整输入掩码,确保模型在训练和推理阶段均能处理不同器官组合的缺失情况。

关键创新:SGM机制是核心创新,它将模型的注意力权重作为反馈信号,实现了对主导器官的自适应抑制。这种方法无需额外标注,且计算开销极低,可无缝集成至现有的自监督学习框架中。

关键设计:模型采用多器官联合预训练策略,损失函数结合了掩码重建任务与自蒸馏一致性约束,通过动态调整掩码比例来平衡不同器官的特征学习权重。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

Pan-FM在UK Biobank上对13类疾病及14个单病种的预测性能均超越了现有单器官及多器官基线模型。特别是在模拟真实世界器官缺失的测试中,Pan-FM展现出极高的鲁棒性,有效缓解了因数据缺失导致的性能衰减,证明了其在处理复杂多模态医学数据方面的优越性。

🎯 应用场景

该研究适用于大规模临床队列研究(如UK Biobank),特别是在多器官联合诊断、全身性疾病风险评估及医学影像缺失数据补全等领域具有重要价值。未来可推广至多模态临床决策支持系统,提升模型在复杂临床环境下的泛化能力与鲁棒性。

📄 摘要(原文)

Foundation models (FMs) have shown great promise in medical imaging, but most FMs are trained on unimodal data within isolated domains, such as brain MRI alone. Human aging and disease arise through coordinated biological processes across organs, therefore motivating multimodal FMs that learn whole-body representations. A key challenge, however, is that real-world multimodal biomedical data are often missing not at random, which can reduce power, limit generalizability, and introduce bias. We propose Pan-FM, a pan-organ foundation model pre-trained on imaging from seven organs (Brain, Heart, Adipose, Liver, Kidney, Spleen, and Pancreas) under realistic missing-organ scenarios. Pan-FM uses a unified backbone that handles organ missingness during both training and inference, and is pre-trained with masking-based self-distillation. We find that naive multimodal pre-training leads to dominant-organ shortcut learning bias, with the model over-relying on dominant organs such as adipose and heart. To address this, we introduce Saliency-Guided Masking (SGM), which uses the model attention distribution to adaptively mask dominant organs during pre-training, thus encouraging more balanced cross-organ, whole-body learning. Notably, SGM introduces negligible computational overhead and can be seamlessly integrated into existing self-supervised learning frameworks to improve multi-organ representation learning. On the UK Biobank, Pan-FM achieves stronger prediction across 13 disease categories and 14 single disease entities than single-organ and multi-organ baselines, with improved robustness under missing-organ settings. Pan-FM serves as a scalable solution to realistic modality-missingness in multimodal learning in system neuroscience and as a step toward more generalizable whole-body FMs.