Prompting Large Language Models for Clinical Temporal Relation Extraction

作者: Jianping He, Laila Rasmy, Haifang Li, Jianfu Li, Zenan Sun, Evan Yu, Degui Zhi, Cui Tao

分类: cs.CL, cs.LG

发布日期: 2024-12-04

💡 一句话要点

利用Prompting技术提升大型语言模型在临床时间关系抽取任务中的性能

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 临床时间关系抽取 大型语言模型 Prompting 微调 医疗文本处理

📋 核心要点

现有临床时间关系抽取方法在处理复杂医疗文本时存在不足，难以充分利用大规模预训练语言模型的知识。
论文提出利用Prompting技术，结合全量微调和参数高效微调策略，来指导大型语言模型进行临床时间关系抽取。
实验结果表明，提出的方法在全监督设置下显著提升了CTRE性能，超越了现有SOTA模型，并在少样本场景下进行了分析。

📝 摘要（中文）

本文旨在探索如何通过Prompting大型语言模型（LLMs）来解决临床时间关系抽取（CTRE）问题，研究涵盖了少样本和全监督两种场景。研究使用了四种LLMs：基于Encoder的GatorTron-Base (345M)/Large (8.9B)；基于Decoder的LLaMA3-8B/MeLLaMA-13B。开发了全量微调（FFT）和参数高效微调（PEFT）策略，并在2012 i2b2 CTRE任务上进行了评估。针对GatorTron-Base，探索了四种微调策略：(1) 标准微调，(2) 使用非冻结LLMs的硬提示，(3) 使用冻结LLMs的软提示，(4) 使用冻结LLMs的低秩适应（LoRA）。对于GatorTron-Large，评估了两种PEFT策略——软提示和LoRA，并利用了量化技术。此外，LLaMA3-8B和MeLLaMA-13B采用了两种PEFT策略：应用于冻结LLMs的LoRA策略与量化（QLoRA），使用指令调优和标准微调。结果表明，在全监督设置下，使用非冻结GatorTron-Base的硬提示获得了最高的F1分数（89.54%），超过了SOTA模型（85.70%）3.74%。此外，QLoRA的两个变体应用于GatorTron-Large和GatorTron-Base的标准微调分别超过SOTA模型2.36%、1.88%和0.25%。基于Decoder的模型在冻结参数的情况下优于基于Encoder的模型；然而，在少样本场景中，趋势发生了逆转。研究表明，选择合适的模型和微调策略对于提升CTRE性能至关重要，并能促进依赖CTRE系统的下游任务。未来的工作将探索更大的模型和更广泛的CTRE应用。

🔬 方法详解

问题定义：论文旨在解决临床时间关系抽取（CTRE）问题，即从临床文本中识别并抽取事件之间的时间关系。现有方法，尤其是传统的基于特征工程的方法，难以泛化到不同的数据集和领域，并且无法有效利用大规模预训练语言模型的知识。

核心思路：论文的核心思路是利用Prompting技术，通过设计合适的Prompt，引导大型语言模型（LLMs）理解CTRE任务，并利用其强大的语言建模能力进行关系抽取。同时，探索不同的微调策略，包括全量微调（FFT）和参数高效微调（PEFT），以适应不同的计算资源和数据量。

技术框架：整体框架包括以下几个主要步骤：1) 选择合适的预训练语言模型，如GatorTron-Base/Large, LLaMA3-8B/MeLLaMA-13B；2) 设计Prompt模板，将CTRE任务转化为语言模型可以理解的文本生成或分类任务；3) 选择微调策略，包括全量微调、硬提示、软提示、LoRA等；4) 在i2b2 CTRE数据集上进行训练和评估；5) 分析不同模型和微调策略的性能差异。

关键创新：论文的关键创新在于：1) 系统性地探索了Prompting技术在CTRE任务中的应用；2) 比较了不同类型的LLMs（Encoder-based vs. Decoder-based）和不同的微调策略（FFT vs. PEFT）的性能；3) 提出了使用硬提示和量化LoRA等方法来进一步提升CTRE性能。

关键设计：在微调策略方面，论文探索了以下关键设计：1) 硬提示：直接修改输入文本，添加任务相关的指令；2) 软提示：引入可学习的Prompt embedding，与输入文本拼接；3) LoRA：通过低秩分解来更新模型参数，减少计算量；4) 量化：降低模型参数的精度，减少内存占用。此外，论文还使用了指令调优来提升Decoder-based模型的性能。

📊 实验亮点

在全监督设置下，使用非冻结GatorTron-Base的硬提示获得了最高的F1分数（89.54%），超过了SOTA模型（85.70%）3.74%。此外，QLoRA的两个变体应用于GatorTron-Large和GatorTron-Base的标准微调分别超过SOTA模型2.36%、1.88%和0.25%。Decoder-based模型在冻结参数的情况下优于Encoder-based模型。

🎯 应用场景

该研究成果可应用于医疗信息抽取、临床决策支持、电子病历分析等领域。准确的临床时间关系抽取能够帮助医生更好地理解患者的病程发展，提高诊断和治疗的效率。未来，该技术有望应用于更广泛的医疗文本分析任务，例如药物不良反应监测、疾病预测等。

📄 摘要（原文）

Objective: This paper aims to prompt large language models (LLMs) for clinical temporal relation extraction (CTRE) in both few-shot and fully supervised settings. Materials and Methods: This study utilizes four LLMs: Encoder-based GatorTron-Base (345M)/Large (8.9B); Decoder-based LLaMA3-8B/MeLLaMA-13B. We developed full (FFT) and parameter-efficient (PEFT) fine-tuning strategies and evaluated these strategies on the 2012 i2b2 CTRE task. We explored four fine-tuning strategies for GatorTron-Base: (1) Standard Fine-Tuning, (2) Hard-Prompting with Unfrozen LLMs, (3) Soft-Prompting with Frozen LLMs, and (4) Low-Rank Adaptation (LoRA) with Frozen LLMs. For GatorTron-Large, we assessed two PEFT strategies-Soft-Prompting and LoRA with Frozen LLMs-leveraging Quantization techniques. Additionally, LLaMA3-8B and MeLLaMA-13B employed two PEFT strategies: LoRA strategy with Quantization (QLoRA) applied to Frozen LLMs using instruction tuning and standard fine-tuning. Results: Under fully supervised settings, Hard-Prompting with Unfrozen GatorTron-Base achieved the highest F1 score (89.54%), surpassing the SOTA model (85.70%) by 3.74%. Additionally, two variants of QLoRA adapted to GatorTron-Large and Standard Fine-Tuning of GatorTron-Base exceeded the SOTA model by 2.36%, 1.88%, and 0.25%, respectively. Decoder-based models with frozen parameters outperformed their Encoder-based counterparts in this setting; however, the trend reversed in few-shot scenarios. Discussions and Conclusions: This study presented new methods that significantly improved CTRE performance, benefiting downstream tasks reliant on CTRE systems. The findings underscore the importance of selecting appropriate models and fine-tuning strategies based on task requirements and data availability. Future work will explore larger models and broader CTRE applications.

Prompting Large Language Models for Clinical Temporal Relation Extraction

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理