STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation
作者: Won June Cho, Daeky Jeong, Hyeongyeol Lim, Hongjun Yoon
分类: cs.CV, cs.AI, cs.CE, cs.LG
发布日期: 2026-06-05
备注: 27 pages, 7 figures
💡 一句话要点
提出STREAM框架以解决数字病理图像生成中的条件崩溃问题
🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 数字病理学 图像生成 深度学习 黎曼流匹配 各向异性解码器 合成图像 计算病理学
📋 核心要点
- 现有的生成模型在使用预训练视觉基础模型时,容易出现条件崩溃,导致生成样本质量下降。
- 本文提出STREAM框架,利用预训练的病理视觉基础模型作为潜在空间,并引入随机黎曼流匹配和各向异性解码器。
- STREAM在乳腺癌和结直肠癌数据集上实现了最先进的重建和生成性能,展示了其有效性。
📝 摘要(中文)
合成病理图像生成解决了计算病理学中的关键挑战,包括患者隐私和对大规模训练数据的需求。现有的生成模型依赖于预训练的视觉基础模型作为条件信号,导致了“条件崩溃”,降低了生成样本的质量和多样性。为此,本文提出了STREAM框架,利用预训练的病理视觉基础模型作为潜在空间,采用随机黎曼流匹配和各向异性解码器,显著提升了乳腺癌和结直肠癌数据集上的重建和生成性能。
🔬 方法详解
问题定义:本文旨在解决数字病理图像生成中的条件崩溃问题,现有方法在使用预训练视觉基础模型时,容易导致生成样本的质量和多样性下降。
核心思路:STREAM框架通过将预训练的病理视觉基础模型作为潜在空间,利用其丰富的语义信息,避免了条件信号对潜在空间的主导影响。
技术框架:STREAM包括两个主要阶段:第一阶段是桥接型随机扰动,建立每个token在单位超球面上的可整形性;第二阶段是各向异性解码器,增强低能量方向的鲁棒性,同时保持高能量方向的保真度。
关键创新:STREAM是首个将黎曼流匹配应用于病理领域的框架,利用潜在空间的几何特性,显著提升了生成图像的质量。
关键设计:在设计中,采用了$ ext{l}_2$归一化的特征,确保它们位于单位超球面上,结合了速度场雅可比的低能量和高能量方向的特性,优化了生成过程。
🖼️ 关键图片
📊 实验亮点
STREAM在乳腺癌和结直肠癌数据集上实现了最先进的重建和生成性能,相较于现有方法,生成样本的质量和多样性显著提升,具体性能数据将在代码公开时提供。
🎯 应用场景
该研究在数字病理学领域具有广泛的应用潜力,能够为医学影像分析提供高质量的合成图像,帮助研究人员在保护患者隐私的同时,获取大规模训练数据,推动基础模型的训练与应用。
📄 摘要(原文)
Synthetic histopathology image generation addresses critical challenges in computational pathology, including patient privacy and the growing need for large-scale training data for foundation models. Latent diffusion models have dominated the image generation domain, with recent works emphasizing that the choice of latent space is critical to the quality of generated images. Existing state-of-the-art generative models in histopathology use pretrained Vision Foundation Models (VFMs) as conditioning signals, and we observe that this leads to "conditioning collapse," where the conditioning signal dominates the latent space and lowers the quality and diversity of generated samples. Therefore, we instead use pretrained histopathology VFMs as the latent space itself, leveraging their patch-token features that encode rich semantic information. We empirically show that these features are $\ell_2$-normalized and lie on the unit hypersphere $\mathcal{S}^{d-1}$ with strong angular dominance and intrinsic curvature, making them naturally suited for a Riemannian formulation. We therefore present STREAM, the first framework to apply Riemannian flow matching in the pathology domain. STREAM consists of two stages: 1) a bridge-type stochastic perturbation that establishes per-token rectifiability on $\mathcal{S}^{d-1}$ for training a Diffusion Transformer (DiT) in latent space, and 2) a novel anisotropic decoder that allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along its high-energy directions. Together, STREAM achieves state-of-the-art reconstruction and generation performance on breast and colorectal cancer datasets. The code will be publicly released upon acceptance.