Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis
作者: Sanket Kachole, Siddhesh Thakur, Shubham Innani, Sanyukta Adap, Suhang You, Carla Pitarch-Abaigar, Spyridon Bakas
分类: cs.CV
发布日期: 2026-06-05
💡 一句话要点
提出Multi-FRuGaL框架以解决癌症诊断中的多模态数据缺失问题
🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 多模态融合 癌症诊断 深度学习 数据缺失 门控学习 信号分解 临床数据
📋 核心要点
- 现有多模态融合方法在处理缺失或稀疏模态时效果不佳,限制了其在癌症诊断中的应用。
- Multi-FRuGaL框架通过分解感知和自适应门控机制,能够在缺失数据情况下进行有效的模态级表示学习。
- 在头颈癌的两个数据集上,Multi-FRuGaL在生存率和复发率预测任务中均显著提高了AUC值,展示了其优越性。
📝 摘要(中文)
现代医学依赖于来自放射学、病理学、文本报告和结构化临床信息的异构数据源。然而,现实患者数据常常不完整,缺失或稀疏的模态限制了标准多模态融合方法的有效性。为此,本文提出了多模态灵活冗余感知分解门控学习框架(Multi-FRuGaL),该框架在缺失数据下进行模态级表示学习。Multi-FRuGaL集成了每个模态的编码器、信号分解层、输入条件门控网络和信息感知融合目标,能够有效区分冗余信号与模态特定的互补信号,选择性地增强信息丰富的模态并抑制冗余或噪声输入。我们在两个多模态头颈癌队列上评估了Multi-FRuGaL,结果显示其在多个任务上均优于基线方法。
🔬 方法详解
问题定义:本文旨在解决在癌症诊断中,由于多模态数据缺失而导致的标准多模态融合方法效果不佳的问题。现有方法往往无法有效处理缺失或稀疏模态,限制了其在实际应用中的有效性。
核心思路:Multi-FRuGaL框架的核心思路是通过分解感知和自适应门控机制,进行模态级表示学习,从而在缺失数据的情况下仍能有效提取信息。该设计旨在增强信息丰富的模态,同时抑制冗余或噪声输入。
技术框架:Multi-FRuGaL的整体架构包括多个主要模块:每个模态的编码器、信号分解层、输入条件门控网络和信息感知融合目标。这些模块协同工作,以实现对模态信号的有效处理和融合。
关键创新:Multi-FRuGaL的主要创新在于其冗余感知的分解门控学习机制,能够在缺失模态的情况下仍保持良好的性能。这与现有方法的本质区别在于其能够动态调整对不同模态的重视程度。
关键设计:在设计上,Multi-FRuGaL采用了输入条件门控网络来动态调整模态权重,并使用信息感知的损失函数来优化模态融合效果。网络结构上,采用了深度学习模型以增强特征提取能力,确保在多模态输入中提取到有效信息。
🖼️ 关键图片
📊 实验亮点
在实验中,Multi-FRuGaL在多个任务上均表现出色,生存率预测的AUC从0.601提升至0.8496,复发率预测的AUC从0.672提升至0.8102,HPV状态分类的AUC达到0.975,显示出显著的性能提升。
🎯 应用场景
该研究的潜在应用领域包括癌症诊断和预后评估,尤其是在处理多模态数据时。Multi-FRuGaL框架能够有效应对临床数据中的缺失问题,提升癌症患者的生存率和复发率预测的准确性,具有重要的实际价值和未来影响。
📄 摘要(原文)
Modern medicine relies on heterogeneous data sources spanning radiology, pathology, text reports, and structured clinical information. However, real-world patient data are frequently incomplete, with missing or sparsely acquired modalities, limiting the effectiveness of standard multimodal fusion approaches. To this end, we propose the Multimodal Flexible Redundancy-aware decomposed GAted Learning (Multi-FRuGaL) framework, a decomposition-aware, adaptive gated intermediate-fusion framework that performs modality-level representation learning under missing data. Multi-FRuGaL integrates per-modality encoders with a signal decomposition layer, an input-conditioned gating network, and an information-aware fusion objective to separate redundant from modality-specific complementary signals, selectively upweighting informative modalities and suppressing redundant or noisy inputs, and remaining well-defined even when multiple modalities are absent. We evaluate Multi-FRuGaL on two multimodal head and neck cancer cohorts: the HANCOCK challenge dataset (N = 763) comprising five modalities and two prognostic endpoints (5-year survival and 2-year recurrence), and the HECKTOR challenge dataset (N = 588) comprising three modalities for human papillomavirus (HPV) status classification. Multi-FRuGaL consistently achieves higher mean performance than the evaluated baselines across multiple tasks, improving AUC from 0.601 to 0.8496 for survival, from 0.672 to 0.8102 for recurrence, and achieving 0.975 AUC for HPV prediction on HECKTOR. For survival analysis, it further achieves a concordance index of 0.6814 for overall survival, 0.7421 for recurrence-free survival, and 0.7143 for progression-free survival on HANCOCK, and 0.7203 for recurrence-free survival on HECKTOR. Qualitative analyses further show that Multi-FRuGaL learns discriminative and robust multimodal representations, even under severe missing-modality conditions.