Using large language models to produce literature reviews: Usages and systematic biases of microphysics parametrizations in 2699 publications

作者: Tianhang Zhang, Shengnan Fu, David M. Schultz, Zhonghua Zheng

分类: cs.AI, stat.AP

发布日期: 2025-03-27

💡 一句话要点

利用大型语言模型分析气象文献：揭示微物理参数化方案的使用和偏差

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 文献综述 微物理参数化 WRF模型 降水模拟

📋 核心要点

现有气象研究中，对大量文献进行系统性分析以了解模型参数化方案的使用和偏差存在挑战。
本文利用大型语言模型GPT-4 Turbo，从大量气象文献中提取模型配置和性能信息，构建文献综述。
实验结果揭示了不同微物理参数化方案在全球的使用情况和系统性偏差，为气象研究提供参考。

📝 摘要（中文）

本文展示了如何使用大型语言模型构建关于WRF模型中微物理参数化方案的文献综述，以了解它们的使用方式和在模拟降水时的系统性偏差。研究人员使用Web of Science和Scopus检索构建了一个包含2699篇出版物的数据库。利用大型语言模型GPT-4 Turbo从这些出版物的文本中提取关于模型配置和性能的信息。结果揭示了九种最流行的微物理参数化方案（Lin, Ferrier, WRF Single-Moment, Goddard Cumulus Ensemble, Morrison, Thompson, 和 WRF Double-Moment）在全球范围内的使用情况。研究发现，2020年之前更多研究使用单参数方案，而之后更多使用双参数方案。九种参数化方案中有七种倾向于高估降水量。然而，参数化方案的系统性偏差在不同地区存在差异。除了Lin, Ferrier, 和Goddard参数化方案倾向于低估几乎所有位置的降水外，其余六种参数化方案倾向于高估，尤其是在中国、东南亚、美国西部和中非地区。该方法可供其他研究人员使用，以了解如何通过人工智能的力量来驾驭日益庞大的科学文献，从而解决他们的研究问题。

🔬 方法详解

问题定义：论文旨在解决气象研究中如何高效地从大量文献中提取关于特定模型（WRF）中微物理参数化方案的使用情况和系统性偏差的问题。现有方法依赖人工阅读和分析，效率低下且难以处理大规模文献。

核心思路：论文的核心思路是利用大型语言模型（LLM）的自然语言处理能力，自动从文献中提取关键信息，例如模型配置、参数化方案和性能评估。通过对提取的信息进行统计分析，可以揭示不同参数化方案的使用模式和系统性偏差。

技术框架：该研究的技术框架主要包括以下几个步骤：1. 构建文献数据库：通过Web of Science和Scopus等数据库检索与WRF模型中微物理参数化方案相关的文献。2. 信息提取：使用GPT-4 Turbo等大型语言模型，从文献文本中提取关键信息，例如使用的参数化方案、模型配置和性能评估结果。3. 数据分析：对提取的信息进行统计分析，例如不同参数化方案的使用频率、降水量的估计偏差等。4. 结果可视化：将分析结果以图表等形式进行可视化展示。

关键创新：该研究的关键创新在于将大型语言模型应用于气象文献的系统性分析。与传统的人工方法相比，该方法能够高效地处理大规模文献，并自动提取关键信息，从而为气象研究提供新的视角和方法。

关键设计：该研究的关键设计包括：1. 使用GPT-4 Turbo作为信息提取工具，利用其强大的自然语言处理能力。2. 构建包含2699篇文献的大规模数据库，保证分析结果的可靠性。3. 对提取的信息进行统计分析，揭示不同参数化方案的使用模式和系统性偏差。4. 针对WRF模型中九种最流行的微物理参数化方案进行分析，具有实际应用价值。

📊 实验亮点

研究结果表明，2020年之前更多研究使用单参数方案，而之后更多使用双参数方案。九种参数化方案中有七种倾向于高估降水量。参数化方案的系统性偏差在不同地区存在差异，例如Lin, Ferrier, 和Goddard参数化方案倾向于低估几乎所有位置的降水，而其他六种参数化方案倾向于高估，尤其是在中国、东南亚、美国西部和中非地区。

🎯 应用场景

该研究的方法可以应用于其他科学领域的文献综述和知识发现。例如，可以用于分析气候变化研究的趋势、评估不同药物的疗效、或了解人工智能技术的发展方向。该方法能够帮助研究人员更高效地利用海量科学文献，加速科学研究的进程。

📄 摘要（原文）

Large language models afford opportunities for using computers for intensive tasks, realizing research opportunities that have not been considered before. One such opportunity could be a systematic interrogation of the scientific literature. Here, we show how a large language model can be used to construct a literature review of 2699 publications associated with microphysics parametrizations in the Weather and Research Forecasting (WRF) model, with the goal of learning how they were used and their systematic biases, when simulating precipitation. The database was constructed of publications identified from Web of Science and Scopus searches. The large language model GPT-4 Turbo was used to extract information about model configurations and performance from the text of 2699 publications. Our results reveal the landscape of how nine of the most popular microphysics parameterizations have been used around the world: Lin, Ferrier, WRF Single-Moment, Goddard Cumulus Ensemble, Morrison, Thompson, and WRF Double-Moment. More studies used one-moment parameterizations before 2020 and two-moment parameterizations after 2020. Seven out of nine parameterizations tended to overestimate precipitation. However, systematic biases of parameterizations differed in various regions. Except simulations using the Lin, Ferrier, and Goddard parameterizations that tended to underestimate precipitation over almost all locations, the remaining six parameterizations tended to overestimate, particularly over China, southeast Asia, western United States, and central Africa. This method could be used by other researchers to help understand how the increasingly massive body of scientific literature can be harnessed through the power of artificial intelligence to solve their research problems.

Using large language models to produce literature reviews: Usages and systematic biases of microphysics parametrizations in 2699 publications

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理