A multi-agent system for spine MRI report generation from multi-sequence imaging
作者: Zhiping Xiao, Junwei Yang, Gongbo Sun, Han Zhang, Hanwen Xu, Yi Yao, Zachary D. Miller, William E. King, Mohammed M. Kanani, Jalal B. Andre, Sammy Chu, Ming Zhang, Paul E. Kinahan, Nathan M. Cross, Sheng Wang
分类: cs.CV, cs.AI, q-bio.QM
发布日期: 2026-06-08
💡 一句话要点
提出SpineAgent以解决脊柱MRI报告生成的复杂性问题
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 脊柱MRI 多智能体系统 报告生成 多模态融合 自动化分析 医学影像 深度学习
📋 核心要点
- 现有的脊柱MRI分析方法在整合多序列数据时面临挑战,导致解读过程复杂且耗时。
- 论文提出SpineAgent,通过多智能体框架和多序列基础模型,提升MRI报告生成的效率和准确性。
- SpineAgent在跨制造商和跨队列评估中表现出色,且在报告生成方面超越了现有技术,获得了放射科医生的高度评价。
📝 摘要(中文)
脊柱病理是全球疼痛和残疾的主要原因,脊柱MRI在临床评估中至关重要,但其解读复杂且耗时,需整合多种成像序列和解剖区域的信息。尽管自动化MRI分析已有进展,但有效结合多序列数据并保留序列特异性诊断信息仍然是一个挑战。本文提出SpineAgent,一个基于多序列基础模型的多智能体框架,利用32047名患者和453683个MRI序列的临床数据进行训练。SpineAgent在病理定位、图像-报告检索等方面表现出色,并通过37个专业代理集成其能力,最终实现了脊柱MRI报告生成的领先性能。
🔬 方法详解
问题定义:本文旨在解决脊柱MRI报告生成中的信息整合难题,现有方法在处理多序列数据时难以有效保留序列特异性诊断信息。
核心思路:SpineAgent通过构建一个多智能体框架,利用多序列基础模型,结合不同MRI序列的信息,生成高质量的MRI报告。
技术框架:该框架包括两个主要模块:首先,分别对T1和T2加权序列进行预训练的DINOv3编码器;其次,采用持续训练策略,学习合成器以嵌入其他序列的图像,生成整合多种信号的患者级嵌入。
关键创新:最重要的创新在于引入了持续训练策略和多智能体框架,使得不同序列的信息能够有效整合,并实现病理定位和多模态图像-报告检索。
关键设计:在参数设置上,采用DINOv3编码器进行特征提取,损失函数设计为适应多序列数据的特性,网络结构则支持多模态信息的融合与处理。通过这些设计,SpineAgent在报告生成中实现了显著的性能提升。
🖼️ 关键图片
📊 实验亮点
SpineAgent在脊柱MRI报告生成中表现出色,通过与现有技术的对比,展示了在准确性和效率上的显著提升。实验结果表明,SpineAgent在跨制造商和跨队列评估中均达到了领先性能,获得了放射科医生的高度认可。
🎯 应用场景
该研究的潜在应用领域包括临床放射学、医学影像分析和人工智能辅助诊断。SpineAgent的高效报告生成能力能够减轻放射科医生的工作负担,提高诊断效率,未来可能在大规模临床应用中发挥重要作用。
📄 摘要(原文)
Spinal pathology is a leading cause of pain and disability worldwide. Spine MRI is central to clinical evaluation, yet its interpretation remains complex and time-consuming, requiring integration of information across multiple imaging sequences and anatomical regions. Despite recent advances in automated MRI analysis, effectively combining multi-sequence data while preserving sequence-specific diagnostic information remains an open challenge. Here we present SpineAgent, a multi-agent framework for spine MRI report generation built upon a multi-sequence foundation model trained on routine clinical data from 32,047 patients and 453,683 MRI series, comprising a total of 13,441,191 MRI slices. To accommodate diverse modalities of sequences, we first pre-train two DINOv3-based encoders separately on T1- and T2-weighted sequences. We then introduce a continual training strategy that learns a synthesizer to embed images of other sequences using the T1 and T2 encoders, producing patient-level embedding that integrates various signals across MRI sequences. Using these embeddings, SpineAgent achieves state-of-the-art performance, and demonstrates strong generalizability under cross-manufacturer and cross-cohort evaluation. Beyond classification, SpineAgent enables pathology localization by identifying findings-relevant slices and segmenting pathological regions. It also supports multimodal image-report retrieval, providing a solid foundation for scalable and explainable MRI report generation. We further integrate these validated capabilities of SpineAgent into 37 specialized agents. Finally, we incorporate their outputs as structured tokens within a Medical Report Agent trained end-to-end for report generation. Through both automated metrics and expert evaluation by five radiologists, SpineAgent achieves leading performance in spine MRI report generation.