FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

📄 arXiv: 2606.11106v1 📥 PDF

作者: Mahmood Alzubaidi, Uzair Shah, Raden Muaz, Ines Abbes, Nader Mohammed, Abdullatif Magram, Khalid Alyafei, Mowafa Househ, Marco Agus

分类: cs.CV, cs.AI

发布日期: 2026-06-09

🔗 代码/项目: GITHUB


💡 一句话要点

提出FADA以解决低收入国家产前超声检查人员短缺问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 产前超声 深度学习 统一模型 选择性蒸馏 临床解释 资源受限环境 AI辅助诊断

📋 核心要点

  1. 现有方法在产前超声检查中面临训练有素的超声医师短缺的问题,导致低收入国家孕妇无法接受专业检查。
  2. FADA是一个统一的视觉-语言模型,通过单一管道实现临床解释、分类、检测和分割,且无需外部标签。
  3. FADA-SKD在分割和检测任务中表现优异,分割的平均Dice达到0.8820,检测的mAP@0.50为0.7671,且100%符合结构化解释标准。

📝 摘要(中文)

全球范围内训练有素的超声医师短缺,限制了低收入和中等收入国家的产前超声筛查,其中超过一半的孕妇未接受专业超声检查。现有深度学习方法在检测、分割或分类上各自独立,需单独模型和专家指定标签。本文提出FADA,一个基于Qwen3.5-VL的统一视觉-语言模型,通过单一的解释优先管道实现临床解释、分类、检测和分割,无需外部标签。FADA通过离线预计算特征缓存,从四个领域特定的基础模型中提取知识。选择性蒸馏在注释任务上应用特征对齐,而解释则依赖标准微调,表现优于全蒸馏。FADA-SKD在分割、检测和结构化解释合规性上均取得了显著成绩,且在资源受限环境中可实现边缘部署。

🔬 方法详解

问题定义:本文旨在解决低收入国家产前超声检查中训练有素的超声医师短缺的问题。现有深度学习方法在检测、分割和分类上各自独立,需单独模型和专家指定标签,限制了其应用。

核心思路:FADA通过构建一个统一的视觉-语言模型,整合临床解释、分类、检测和分割功能,采用解释优先的管道设计,避免了对外部标签的依赖。

技术框架:FADA基于Qwen3.5-VL构建,包含离线预计算特征缓存和选择性蒸馏模块。模型通过特征对齐优化注释任务,而解释则通过标准微调实现。

关键创新:FADA的选择性蒸馏方法在注释任务上应用特征对齐,显著提升了性能,相较于传统的全蒸馏方法,能够更有效地利用领域特定知识。

关键设计:FADA-SKD的设计包括特征缓存机制、选择性蒸馏策略以及在单个消费级GPU上训练的能力,确保了模型在资源受限环境中的可部署性。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

FADA-SKD在分割任务中实现了0.8820的平均Dice,检测任务的mAP@0.50达到0.7671,且100%符合结构化解释标准。专家验证显示237幅图像中73.5%的解释在临床指导下得分完美,验证了模型的临床可接受性。

🎯 应用场景

FADA的研究成果在低收入和中等收入国家的产前超声检查中具有重要应用潜力。通过将AI辅助的胎儿评估与便携式超声设备相结合,该系统能够直接解决资源受限环境中的诊断获取问题,提升孕妇的健康管理水平。

📄 摘要(原文)

A global shortage of trained sonographers limits prenatal ultrasound screening in low- and middle-income countries, where over half of pregnant women receive no skilled sonography. Current deep learning approaches address detection, segmentation, or classification in isolation, each demanding a separate model and expert-specified labels at inference. We present FADA, a unified vision-language model built on Qwen3.5-VL that performs clinical interpretation, classification, detection, and segmentation through a single interpretation-first pipeline without external labels. FADA distills knowledge from four domain-specific foundation models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM) via offline pre-computed feature caching. Selective distillation, which applies feature alignment only to annotation tasks while interpretation relies on standard fine-tuning, consistently outperforms full distillation across most evaluation axes. The recommended variant, FADA-SKD, achieves 0.8820 mean Dice for segmentation, 0.7671 mAP@0.50 for detection, and 100% structured interpretation compliance. Expert sonographer validation across 237 images confirms clinically acceptable outputs in both autonomous and human-in-the-loop modes, with 73.5% of interpretations scoring perfectly under clinician guidance. The system is trainable on a single consumer GPU and deployable without cloud connectivity. We validate edge deployment by running the compressed 0.8B model on a commodity smartphone (Qualcomm Snapdragon 7 Gen 1, 12 GB RAM) using llama.cpp with GGUF quantization, completing the full 5-phase pipeline in approximately 60 seconds entirely offline. This establishes a practical pathway for integrating AI-assisted fetal assessment with portable ultrasound devices, directly addressing diagnostic access gaps in resource-constrained settings. Code, models, and data are available at https://github.com/mahmoodphd/FADA.