VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

作者: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li

分类: cs.CV

发布日期: 2024-06-07 (更新: 2024-11-22)

🔗 代码/项目: GITHUB

💡 一句话要点

VISTA3D：用于3D医学影像的统一分割基础模型

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 3D医学影像分割 基础模型 交互式分割 零样本学习 超体素 深度学习 医学影像分析

📋 核心要点

现有方法难以兼顾3D医学图像分割的自动化、交互式校正和零样本泛化能力，尤其是在复杂3D结构上。
VISTA3D通过统一的框架，同时优化自动分割、交互式分割和零样本学习，旨在构建一个临床实用的3D基础模型。
VISTA3D在3D自动和交互式分割任务上均达到SOTA，支持127个类别，并利用3D超体素方法提升了零样本性能。

📝 摘要（中文）

针对2D自然图像和视频的交互式分割基础模型激发了构建3D医学影像基础模型的浓厚兴趣。然而，3D医学影像的领域差异和临床用例需要一个专门的模型，这与现有的2D解决方案不同。具体来说，这样的基础模型应该支持一个完整的流程，能够真正减少人工工作量。将3D医学图像视为2D切片的序列并重用交互式2D基础模型似乎很简单，但2D标注对于3D任务来说太耗时了。此外，对于大型队列分析，高度精确的自动分割模型能够最大程度地减少人工工作量。然而，这些模型缺乏对交互式校正的支持，并且缺乏对新结构的零样本能力，这是“基础”的关键特征。虽然在3D中重用预训练的2D骨干网络增强了零样本潜力，但它们在复杂3D结构上的性能仍然落后于领先的3D模型。为了解决这些问题，我们提出了VISTA3D，即通用成像分割和注释模型，旨在用一个统一的基础模型解决所有这些挑战和需求。VISTA3D建立在完善的3D分割流程之上，并且是第一个在3D自动（支持127个类别）和3D交互式分割中都达到最先进性能的模型，即使与大型多样化基准上的顶级3D专家模型相比也是如此。此外，VISTA3D的3D交互式设计允许高效的人工校正，并且一种新颖的3D超体素方法，可以提取2D预训练骨干网络，从而使VISTA3D具有顶级的3D零样本性能。我们相信该模型、配方和见解代表了朝着临床有用的3D基础模型迈出的有希望的一步。代码和权重可在https://github.com/Project-MONAI/VISTA公开获取。

🔬 方法详解

问题定义：现有3D医学图像分割方法通常专注于自动分割，缺乏交互式校正能力，难以适应临床实际需求。同时，零样本泛化能力不足，无法有效处理新的解剖结构。将2D分割模型扩展到3D虽然可行，但2D标注成本高昂，且在复杂3D结构上的性能受限。

核心思路：VISTA3D的核心在于构建一个统一的3D分割基础模型，同时支持高精度的自动分割、高效的交互式校正和强大的零样本泛化能力。通过结合3D分割流程、交互式设计和2D预训练骨干网络的知识迁移，实现上述目标。

技术框架：VISTA3D基于标准的3D分割流程构建，包含以下主要模块：1) 3D分割网络，用于自动分割；2) 交互式分割模块，允许用户通过点击等方式进行校正；3) 3D超体素模块，用于从2D预训练骨干网络中提取知识，提升零样本性能。整体流程是先进行自动分割，然后用户可以进行交互式校正，并利用超体素信息提升分割精度。

关键创新：VISTA3D的关键创新在于：1) 统一的框架，同时支持自动分割、交互式校正和零样本学习；2) 3D超体素方法，有效利用2D预训练骨干网络的知识，提升3D零样本性能；3) 针对3D医学影像的交互式设计，提高人工校正效率。

关键设计：VISTA3D使用了标准的3D分割网络结构（具体结构未知），并设计了针对3D数据的交互式校正方法（具体方法未知）。关键在于3D超体素的生成和使用，通过将3D体素聚合成超体素，可以更有效地从2D预训练骨干网络中提取特征，并将其融入到3D分割网络中。损失函数方面，可能使用了Dice Loss等常用的分割损失函数，并针对交互式校正和零样本学习进行了调整（具体细节未知）。

🖼️ 关键图片

📊 实验亮点

VISTA3D在多个3D医学影像分割数据集上取得了最先进的性能，包括自动分割和交互式分割。与现有3D专家模型相比，VISTA3D在精度和效率上均有显著提升。此外，VISTA3D的零样本性能也优于其他方法，能够有效处理新的解剖结构。具体性能数据和提升幅度在论文中进行了详细展示（具体数值未知）。

🎯 应用场景

VISTA3D在医学影像分析领域具有广泛的应用前景，可用于辅助医生进行疾病诊断、治疗计划制定和疗效评估。通过提供高精度的自动分割和高效的交互式校正，可以显著减少医生的人工工作量，提高诊断效率和准确性。此外，其强大的零样本泛化能力使其能够快速适应新的解剖结构和疾病类型，具有重要的临床价值。

📄 摘要（原文）

Foundation models for interactive segmentation in 2D natural images and videos have sparked significant interest in building 3D foundation models for medical imaging. However, the domain gaps and clinical use cases for 3D medical imaging require a dedicated model that diverges from existing 2D solutions. Specifically, such foundation models should support a full workflow that can actually reduce human effort. Treating 3D medical images as sequences of 2D slices and reusing interactive 2D foundation models seems straightforward, but 2D annotation is too time-consuming for 3D tasks. Moreover, for large cohort analysis, it's the highly accurate automatic segmentation models that reduce the most human effort. However, these models lack support for interactive corrections and lack zero-shot ability for novel structures, which is a key feature of "foundation". While reusing pre-trained 2D backbones in 3D enhances zero-shot potential, their performance on complex 3D structures still lags behind leading 3D models. To address these issues, we present VISTA3D, Versatile Imaging SegmenTation and Annotation model, that targets to solve all these challenges and requirements with one unified foundation model. VISTA3D is built on top of the well-established 3D segmentation pipeline, and it is the first model to achieve state-of-the-art performance in both 3D automatic (supporting 127 classes) and 3D interactive segmentation, even when compared with top 3D expert models on large and diverse benchmarks. Additionally, VISTA3D's 3D interactive design allows efficient human correction, and a novel 3D supervoxel method that distills 2D pretrained backbones grants VISTA3D top 3D zero-shot performance. We believe the model, recipe, and insights represent a promising step towards a clinically useful 3D foundation model. Code and weights are publicly available at https://github.com/Project-MONAI/VISTA.

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理