BreastGPT: A Multimodal Large Language Model for the Full Spectrum of Breast Cancer Clinical Routine

作者: Yang Liu, Jiajin Zhang, Danyang Tu, Yaojun Hu, Jiao Qu, Jiuyu Zhang, Yu Shi, Wei Fang, Shi Gu, Ling Zhang, Yingda Xia

分类: cs.CV, cs.CL

发布日期: 2026-06-03

🔗 代码/项目: PROJECT_PAGE

💡 一句话要点

提出BreastGPT以解决乳腺癌临床管理中的多模态推理问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 乳腺癌 多模态推理 大语言模型 医学影像 临床管理 深度学习 数据集构建

📋 核心要点

现有医学多模态大语言模型在乳腺癌临床管理中面临数据稀缺和模型通用性不足的问题，限制了其在工作流级别的推理能力。
本文提出BreastStage语料库，包含186万对指令，涵盖多种影像模态，旨在支持乳腺癌的全流程临床推理。
BreastGPT在BreastStage-Bench上表现优异，封闭式准确率达到75.66%，开放式得分为89.92%，显著优于现有模型。

📝 摘要（中文）

乳腺癌仍是女性癌症相关死亡的主要原因，其临床管理需要在筛查、诊断和治疗规划等多个阶段进行多模态推理。现有的医学多模态大语言模型（MLLMs）通常仅在孤立的模态或狭窄的任务范围内进行评估，限制了其在临床工作流中的应用。本文首先引入了BreastStage，一个包含186万对指令的乳腺影像指令语料库，涵盖5种影像模态和136个任务模板。基于此语料库，提出了BreastGPT，一个统一的MLLM，配备双分支视觉编码器和概念保留的令牌压缩技术，成功缩小了标准放射学与千兆像素病理学之间的规模差距。BreastGPT在BreastStage-Bench上取得了75.66%的封闭式准确率和89.92%的开放式得分，超越了现有的通用和医学特定MLLMs。

🔬 方法详解

问题定义：本文旨在解决乳腺癌临床管理中多模态推理的不足，现有方法通常在数据稀缺和模型通用性方面存在局限，无法有效支持临床工作流的各个阶段。

核心思路：提出BreastStage语料库，整合多种影像模态和任务模板，构建BreastGPT模型，通过双分支视觉编码器和概念保留的令牌压缩技术，提升多模态推理能力。

技术框架：BreastGPT的整体架构包括数据预处理、双分支视觉编码、令牌压缩和推理模块，能够处理不同模态的输入并进行有效的推理。

关键创新：BreastStage语料库的构建和BreastGPT模型的设计是本研究的核心创新，尤其是双分支视觉编码器的引入，使得模型能够在不同规模的影像数据上进行有效推理。

关键设计：模型采用了概念保留的令牌压缩技术，确保在处理高分辨率影像时，重要信息不被丢失，同时优化了训练过程中的损失函数设置，以提高模型的推理准确性。

🖼️ 关键图片

📊 实验亮点

BreastGPT在BreastStage-Bench上取得了75.66%的封闭式准确率和89.92%的开放式得分，显著优于现有的通用和医学特定MLLMs，验证了多模态数据和跨尺度视觉建模在临床推理中的重要性。

🎯 应用场景

该研究的潜在应用领域包括乳腺癌的早期筛查、诊断支持和个性化治疗规划。BreastGPT的多模态推理能力可以帮助医生在临床决策中更好地整合不同类型的影像数据，从而提高患者的治疗效果和生存率。未来，该模型有望扩展到其他类型的癌症管理和更广泛的医疗领域。

📄 摘要（原文）

Breast cancer remains a leading cause of cancer-related mortality among women. Its clinical management requires multimodal reasoning across a clinical workflow that spans \textit{screening}, \textit{diagnosis} and \textit{treatment planning}, where each stage involves distinct imaging modalities, task objectives, and reasoning patterns. However, constrained by data scarcity and model versatility, existing medical MLLMs are typically evaluated on isolated modalities or narrow task families, limiting their ability to support workflow-level clinical reasoning. In this work, we first introduce \textbf{BreastStage}, a workflow-aligned breast imaging instruction corpus comprising 1.86M instruction-following pairs curated from 17 sub-datasets across 5 imaging modalities and 136 task templates. Its held-out split, \textbf{BreastStage-Bench}, provides a comprehensive benchmark for evaluating multimodal reasoning across the breast cancer care continuum. Building on this corpus, we propose \textbf{BreastGPT}, a unified MLLM equipped with a dual-branch visual encoder and concept-preserving token compression to bridge the scale gap between standard radiology and gigapixel pathology. On BreastStage-Bench, BreastGPT achieves 75.66\% closed-ended accuracy and 89.92\% open-ended score, outperforming both general-purpose and medical-specific MLLMs across clinical stages and task formats. These results suggest that workflow-aligned data and cross-scale visual modeling are critical for clinically grounded medical MLLMs. All data, code, and model checkpoints are released at https://yangyy-liu.github.io/BreastGPT.io.

BreastGPT: A Multimodal Large Language Model for the Full Spectrum of Breast Cancer Clinical Routine

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理