SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery

作者: Caleb S. Spradlin, Jordan A. Caraballo-Vega, Jian Li, Mark L. Carroll, Jie Gong, Paul M. Montesano

分类: cs.CV, cs.AI, cs.LG

发布日期: 2024-11-26

备注: 19 pages, 5 figures

💡 一句话要点

SatVision-TOA：用于粗分辨率全天候遥感影像的地理空间基础模型

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 遥感影像 基础模型 自监督学习 掩码图像建模 SwinV2 大气顶层辐射 云检测 地表监测

📋 核心要点

现有基础模型主要针对高分辨率、无云卫星图像，限制了其在需要频繁时间监测或宽光谱分析场景中的应用。
SatVision-TOA利用掩码图像建模和SwinV2架构，在大规模MODIS TOA数据上进行自监督预训练，学习上下文表示。
实验表明，SatVision-TOA在3D云检索等下游任务上显著优于基线方法，mIOU提升明显，假阴性率降低。

📝 摘要（中文）

本文提出SatVision-TOA，一个新型的基础模型，它在14波段MODIS L1B大气顶层（TOA）辐射图像上进行预训练，旨在解决中等和粗分辨率全天候遥感数据的模型预训练需求。SatVision-TOA模型采用掩码图像建模（MIM）框架和SwinV2架构进行预训练，通过自监督学习无需标签即可学习详细的上下文表示。该模型拥有30亿参数，并在1亿张图像上进行训练，据我们所知，这是仅在卫星遥感图像上训练的最大基础模型。结果表明，SatVision-TOA在诸如3D云检索等下游任务上优于基线方法。值得注意的是，该模型实现了0.46的平均交并比（mIOU），相比基线的0.22有了显著提高。此外，微调任务中的假阴性结果率与基线相比降低了50%以上。我们的工作通过学习各种大气和气溶胶条件，改进了多光谱遥感预训练视觉模型，从而改善云和地表监测。

🔬 方法详解

问题定义：现有遥感领域的基础模型大多针对高分辨率、无云的卫星图像或照片，这限制了它们在需要频繁时间监测、处理大气变量或进行大气校正的应用场景中的有效性。因此，需要一个能够处理中等和粗分辨率、全天候遥感数据的预训练模型。

核心思路：SatVision-TOA的核心思路是利用大规模的MODIS L1B大气顶层（TOA）辐射图像进行自监督预训练。通过学习TOA辐射数据，模型能够更好地理解大气条件和气溶胶的影响，从而提高在云和地表监测等任务中的性能。

技术框架：SatVision-TOA的整体框架基于掩码图像建模（MIM）和SwinV2架构。首先，随机掩盖输入图像的部分区域。然后，模型尝试根据未掩盖区域的信息来重建被掩盖的区域。通过这种方式，模型学习图像的上下文表示。预训练完成后，模型可以针对特定的下游任务进行微调。

关键创新：SatVision-TOA的关键创新在于它是第一个专门针对中等和粗分辨率、全天候遥感数据进行预训练的基础模型。它利用TOA辐射数据，使模型能够学习大气条件和气溶胶的影响，这与以往主要关注无云图像的模型有本质区别。

关键设计：SatVision-TOA是一个拥有30亿参数的大型模型，在包含1亿张图像的数据集上进行训练。模型采用SwinV2架构，这是一种高效的Transformer架构，适合处理高分辨率图像。损失函数采用常用的图像重建损失，例如L1或L2损失。掩码比例是一个重要的超参数，需要根据数据集的特点进行调整。具体数值未知。

🖼️ 关键图片

📊 实验亮点

SatVision-TOA在3D云检索任务中取得了显著的性能提升。该模型实现了0.46的平均交并比（mIOU），相比基线的0.22有了大幅提高。此外，微调任务中的假阴性结果率与基线相比降低了50%以上。这些结果表明，SatVision-TOA能够有效地学习遥感数据的上下文信息，并将其应用于下游任务。

🎯 应用场景

SatVision-TOA在气候变化研究、环境监测、农业遥感等领域具有广泛的应用前景。它可以用于改进云检测、地表覆盖分类、气溶胶反演等任务的精度。通过更好地理解大气条件和地表特征，SatVision-TOA可以为决策者提供更准确、更可靠的信息，从而支持可持续发展。

📄 摘要（原文）

Foundation models have the potential to transform the landscape of remote sensing (RS) data analysis by enabling large computer vision models to be pre-trained on vast amounts of remote sensing data. These models can then be fine-tuned with small amounts of labeled training and applied to a variety of applications. Most existing foundation models are designed for high spatial resolution, cloud-free satellite imagery or photos, limiting their applicability in scenarios that require frequent temporal monitoring or broad spectral profiles. As a result, foundation models trained solely on cloud-free images have limited utility for applications that involve atmospheric variables or require atmospheric corrections. We introduce SatVision-TOA, a novel foundation model pre-trained on 14-band MODIS L1B Top-Of-Atmosphere (TOA) radiance imagery, addressing the need for models pre-trained to handle moderate- and coarse-resolution all-sky remote sensing data. The SatVision-TOA model is pre-trained using a Masked-Image-Modeling (MIM) framework and the SwinV2 architecture, and learns detailed contextual representations through self-supervised learning without the need for labels. It is a 3 billion parameter model that is trained on 100 million images. To our knowledge this is the largest foundation model trained solely on satellite RS imagery. Results show that SatVision-TOA achieves superior performance over baseline methods on downstream tasks such as 3D cloud retrieval. Notably, the model achieves a mean intersection over union (mIOU) of 0.46, a substantial improvement over the baseline mIOU of 0.22. Additionally, the rate of false negative results in the fine-tuning task were reduced by over 50% compared to the baseline. Our work advances pre-trained vision modeling for multispectral RS by learning from a variety of atmospheric and aerosol conditions to improve cloud and land surface monitoring.

SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理