ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema

作者: Fei Wang, Yuewen Zheng, Qin Li, Jingyi Wu, Pengfei Li, Luxia Zhang

分类: cs.CL

发布日期: 2024-07-26

💡 一句话要点

ChatSchema：基于Schema的大型多模态模型医学信息抽取流水线

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 医学信息抽取 大型多模态模型 Schema 结构化信息 光学字符识别

📋 核心要点

医学报告信息抽取面临非结构化数据处理难题，现有方法难以有效利用领域知识进行精准抽取。
ChatSchema通过预定义Schema引导大型多模态模型，实现医学报告信息的结构化抽取与标准化。
实验结果表明，ChatSchema在关键信息抽取和值抽取上均表现出色，显著优于基线方法。

📝 摘要（中文）

本研究提出了一种名为ChatSchema的有效方法，用于从医学报告的非结构化数据中抽取和结构化信息。该方法结合了大型多模态模型（LMMs）和基于Schema的光学字符识别（OCR）。通过集成预定义的Schema，旨在使LMMs能够根据Schema规范直接抽取和标准化信息，从而促进进一步的数据录入。该方法包括分类和抽取两个阶段，用于对报告场景进行分类和结构化信息。建立并标注了一个数据集来验证ChatSchema的有效性，并使用精确率、召回率、F1分数和准确率指标评估了关键信息抽取。基于关键信息抽取，进一步评估了值抽取。在两个LMM上进行了消融研究，以说明使用不同输入模态和方法进行结构化信息抽取的改进。结果表明，GPT-4o的整体性能更高。

🔬 方法详解

问题定义：医学报告中存在大量非结构化的文本和图像数据，从中准确、高效地提取关键信息是一项挑战。现有方法通常依赖于人工标注或规则引擎，成本高昂且泛化能力有限。此外，如何有效地利用领域知识（如预定义的Schema）来指导信息抽取也是一个痛点。

核心思路：ChatSchema的核心思路是利用预定义的Schema作为先验知识，引导大型多模态模型（LMMs）进行信息抽取。通过将Schema融入到模型的输入中，使模型能够理解需要抽取的信息类型和格式，从而提高抽取精度和效率。这种方法类似于“填表”，Schema定义了表的结构，LMMs负责填充表中的内容。

技术框架：ChatSchema包含两个主要阶段：分类和抽取。首先，使用LMMs对医学报告进行分类，确定报告的类型和场景。然后，根据报告类型选择相应的Schema，并使用LMMs从报告中抽取关键信息和对应的值。整个流程可以看作是一个流水线，LMMs和Schema协同工作，实现信息的结构化。

关键创新：ChatSchema的关键创新在于将Schema作为LMMs的输入，从而实现了Schema驱动的信息抽取。这种方法充分利用了LMMs的强大能力，同时又避免了模型“自由发挥”可能导致的错误。此外，ChatSchema还通过两阶段的设计，提高了信息抽取的准确性和效率。

关键设计：该方法使用GPT-4o和Gemini 1.5 Pro两种LMMs进行实验，并比较了它们的性能。数据集包含来自北京大学第一医院的100份医学报告，并标注了2945个键值对。评估指标包括精确率、召回率、F1分数和准确率。消融研究用于验证Schema对信息抽取性能的影响。

🖼️ 关键图片

📊 实验亮点

实验结果表明，ChatSchema在关键信息抽取方面取得了显著的性能提升。使用GPT-4o时，关键信息的精确率、召回率和F1分数均达到98.6%。基于正确关键信息的值抽取准确率达到97.2%，精确率、召回率和F1分数均为95.8%。消融研究表明，与基线方法相比，ChatSchema的整体准确率提高了26.9%，整体F1分数提高了27.4%。

🎯 应用场景

ChatSchema可应用于医疗信息管理、临床决策支持、医学研究等领域。通过自动抽取和结构化医学报告中的信息，可以提高数据录入效率，减少人工错误，并为医生提供更全面、准确的患者信息。未来，该方法有望应用于更大规模的医学数据集，并与其他医疗信息系统集成。

📄 摘要（原文）

Objective: This study introduces ChatSchema, an effective method for extracting and structuring information from unstructured data in medical paper reports using a combination of Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) based on the schema. By integrating predefined schema, we intend to enable LMMs to directly extract and standardize information according to the schema specifications, facilitating further data entry. Method: Our approach involves a two-stage process, including classification and extraction for categorizing report scenarios and structuring information. We established and annotated a dataset to verify the effectiveness of ChatSchema, and evaluated key extraction using precision, recall, F1-score, and accuracy metrics. Based on key extraction, we further assessed value extraction. We conducted ablation studies on two LMMs to illustrate the improvement of structured information extraction with different input modals and methods. Result: We analyzed 100 medical reports from Peking University First Hospital and established a ground truth dataset with 2,945 key-value pairs. We evaluated ChatSchema using GPT-4o and Gemini 1.5 Pro and found a higher overall performance of GPT-4o. The results are as follows: For the result of key extraction, key-precision was 98.6%, key-recall was 98.5%, key-F1-score was 98.6%. For the result of value extraction based on correct key extraction, the overall accuracy was 97.2%, precision was 95.8%, recall was 95.8%, and F1-score was 95.8%. An ablation study demonstrated that ChatSchema achieved significantly higher overall accuracy and overall F1-score of key-value extraction, compared to the Baseline, with increases of 26.9% overall accuracy and 27.4% overall F1-score, respectively.

ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理