A probabilistic foundation model for crystal structure denoising, phase classification, and order parameters

作者: Hyuna Kwon, Babak Sadigh, Sebastien Hamel, Vincenzo Lordi, John Klepeis, Fei Zhou

分类: cond-mat.mtrl-sci, cs.AI

发布日期: 2025-12-11 (更新: 2025-12-21)

💡 一句话要点

提出一种概率基础模型以解决晶体结构去噪与相分类问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 晶体结构 去噪 相分类 序参量 概率模型 材料科学 机器学习

📋 核心要点

现有方法在提取晶体结构的相标签和序参量时，普适性和稳健性不足，且无法处理强热扰动和缺陷。
本文提出了一种对数概率基础模型，将去噪、相分类和序参量提取整合在一个统一的概率框架中。
实验结果表明，该模型在数百个原型上表现出良好的普适性和稳健性，并能准确处理复杂系统。

📝 摘要（中文）

原子模拟生成大量噪声结构数据，但以普适、稳健和可解释的方式提取相标签、序参量（OPs）和缺陷信息仍然具有挑战性。现有工具如PTM和CNA仅限于少量手工设计的晶格，且在强热扰动或缺陷下表现不佳，无法提供每个原子的概率或置信度分数。本文提出了一种对数概率基础模型，将去噪、相分类和OP提取统一在一个概率框架内。我们重用MACE-MP基础原子间势，训练其预测每个原子、每个相的logits，并将其聚合为全局对数密度，进而定义保守的评分场。去噪对应于在学习的对数密度上进行梯度上升，相标签通过最大化logits获得，logits值作为连续、缺陷敏感且可解释的OP，量化理想相的欧几里得距离。我们展示了该方法在数百个原型上的普适性、在强热和缺陷引起的扰动下的稳健性，以及对复杂系统如冰多晶型、冰-水界面和冲击压缩钛的准确处理。

🔬 方法详解

问题定义：本文旨在解决在原子模拟中提取晶体结构的相标签、序参量和缺陷信息的挑战。现有方法如PTM和CNA在强热扰动和缺陷情况下表现不佳，且无法提供每个原子的概率或置信度分数。

核心思路：论文提出了一种对数概率基础模型，通过重用MACE-MP基础原子间势，训练模型以预测每个原子和每个相的logits，并将其聚合为全局对数密度。该设计使得去噪、相分类和序参量提取可以在同一框架下进行。

技术框架：整体架构包括三个主要模块：去噪、相分类和序参量提取。去噪通过在学习的对数密度上进行梯度上升实现，相标签通过最大化logits获得，而logits值则作为可解释的序参量。

关键创新：最重要的创新在于将去噪、相分类和序参量提取统一在一个概率框架中，克服了现有方法的局限性，提供了每个原子的概率分数。

关键设计：模型通过训练预测每个原子、每个相的logits，并聚合为全局对数密度，采用的损失函数和网络结构设计旨在提高模型的稳健性和普适性。

🖼️ 关键图片

📊 实验亮点

实验结果显示，该模型在数百个原型上展现出良好的普适性和稳健性，能够有效处理强热扰动和缺陷引起的复杂情况。与现有方法相比，模型在相分类和序参量提取的准确性上有显著提升，具体性能数据未提供。

🎯 应用场景

该研究的潜在应用领域包括材料科学、化学和物理等领域，能够帮助科学家更准确地分析和理解复杂晶体结构的性质与行为。未来，该模型有望推动新材料的发现与优化，提升材料设计的效率与准确性。

📄 摘要（原文）

Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and interpretable remains challenging. Existing tools such as PTM and CNA are restricted to a small set of hand-crafted lattices (e.g.\ FCC/BCC/HCP), degrade under strong thermal disorder or defects, and produce hard, template-based labels without per-atom probability or confidence scores. Here we introduce a log-probability foundation model that unifies denoising, phase classification, and OP extraction within a single probabilistic framework. We reuse the MACE-MP foundation interatomic potential on crystal structures mapped to AFLOW prototypes, training it to predict per-atom, per-phase logits $l$ and to aggregate them into a global log-density $\log \hat{P}θ(\boldsymbol{r})$ whose gradient defines a conservative score field. Denoising corresponds to gradient ascent on this learned log-density, phase labels follow from $\arg\max_c l{ac}$, and the $l$ values act as continuous, defect-sensitive and interpretable OPs quantifying the Euclidean distance to ideal phases. We demonstrate universality across hundreds of prototypes, robustness under strong thermal and defect-induced disorder, and accurate treatment of complex systems such as ice polymorphs, ice--water interfaces, and shock-compressed Ti.

A probabilistic foundation model for crystal structure denoising, phase classification, and order parameters

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理