Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation
作者: Hisashi Johno, Yuki Johno, Akitomo Amakawa, Junichi Sato, Ryota Tozuka, Atsushi Komaba, Hiroaki Watanabe, Hiroki Watanabe, Chihiro Goto, Hiroyuki Morisaka, Hiroshi Onishi, Kazunori Nakamoto
分类: cs.CL
发布日期: 2025-03-19
备注: 11 pages, 6 figures, 2 tables, 6 supplementary files
💡 一句话要点
利用检索增强生成提升胰腺癌分期准确性
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 胰腺癌分期 大型语言模型 检索增强生成 临床辅助诊断 医学影像分析
📋 核心要点
- 现有LLM在医学分期任务中缺乏外部知识,导致准确性受限,且结果缺乏透明度,难以被医生信任。
- 采用检索增强生成(RAG)框架,通过检索相关医学指南,为LLM提供外部知识,提升分期准确性。
- 实验表明,RAG-LLM在胰腺癌分期任务中显著优于未采用RAG的LLM,并能提供检索证据,增强可信度。
📝 摘要(中文)
本研究旨在评估检索增强生成(RAG)技术在提高大型语言模型(LLM)胰腺癌分期准确性方面的作用。通过将LLM与从可靠外部知识(REK)中检索的相关信息相结合,RAG旨在增强LLM的功能和可靠性。研究比较了三种方案:REK+/RAG+(NotebookLM与REK)、REK+/RAG-(Gemini 2.0 Flash与REK)和REK-/RAG-(Gemini 2.0 Flash无REK),对100个虚构的胰腺癌病例进行分期,依据CT影像结果,评估TNM分类、局部侵犯因素和可切除性分类。结果表明,REK+/RAG+组的分期准确率显著优于其他两组,表明RAG可以提高LLM的分期准确性。此外,REK+/RAG+能够明确展示检索到的REK片段,为医生提供透明度,突显了其在临床诊断和分类中的适用性。
🔬 方法详解
问题定义:论文旨在解决大型语言模型(LLM)在胰腺癌分期任务中准确性不足的问题。现有方法依赖于LLM自身的知识,缺乏外部医学指南的支撑,导致分期结果可能不准确,且缺乏透明度,难以让医生信任。
核心思路:论文的核心思路是利用检索增强生成(RAG)框架,将LLM与外部知识库(日本胰腺癌分期指南)相结合。通过检索与病例相关的指南信息,为LLM提供更全面的背景知识,从而提高分期准确性。
技术框架:整体框架包含三个主要模块:1) 外部知识库(REK):包含日本胰腺癌分期指南的摘要;2) 检索模块:根据输入的CT影像结果,从REK中检索相关信息;3) 生成模块:LLM(Gemini 2.0 Flash或NotebookLM)基于检索到的信息和CT影像结果,进行胰腺癌分期。研究比较了三种配置:REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK)。
关键创新:最重要的技术创新点在于验证了RAG框架在胰腺癌分期任务中的有效性。与直接使用LLM进行分期相比,RAG框架能够利用外部知识,提高分期准确性,并提供检索到的证据,增强结果的可解释性和可信度。
关键设计:研究使用了日本胰腺癌分期指南的摘要作为外部知识库。评估指标包括分期准确率、TNM分类准确率和检索准确率。检索准确率衡量了检索到的REK片段是否足以支持分期决策。具体的分期标准包括TNM分类、局部侵犯因素和可切除性分类。
🖼️ 关键图片
📊 实验亮点
实验结果表明,REK+/RAG+组(NotebookLM with REK)的分期准确率达到70%,显著优于REK+/RAG-组(Gemini 2.0 Flash with REK,38%)和REK-/RAG-组(Gemini 2.0 Flash without REK,35%)。在TNM分类方面,REK+/RAG+组的准确率达到80%,同样显著优于其他两组(55%和50%)。此外,REK+/RAG+组的检索准确率达到92%。
🎯 应用场景
该研究成果可应用于临床辅助诊断,帮助医生更准确地进行胰腺癌分期,从而制定更合适的治疗方案。未来,可以将RAG框架应用于其他癌症的分期任务,并结合更丰富的医学知识库,构建更智能的临床决策支持系统。
📄 摘要(原文)
Purpose: Retrieval-augmented generation (RAG) is a technology to enhance the functionality and reliability of large language models (LLMs) by retrieving relevant information from reliable external knowledge (REK). RAG has gained interest in radiology, and we previously reported the utility of NotebookLM, an LLM with RAG (RAG-LLM), for lung cancer staging. However, since the comparator LLM differed from NotebookLM's internal model, it remained unclear whether its advantage stemmed from RAG or inherent model differences. To better isolate RAG's impact and assess its utility across different cancers, we compared NotebookLM with its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment. Materials and Methods: A summary of Japan's pancreatic cancer staging guidelines was used as REK. We compared three groups - REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK) - in staging 100 fictional pancreatic cancer cases based on CT findings. Staging criteria included TNM classification, local invasion factors, and resectability classification. In REK+/RAG+, retrieval accuracy was quantified based on the sufficiency of retrieved REK excerpts. Results: REK+/RAG+ achieved a staging accuracy of 70%, outperforming REK+/RAG- (38%) and REK-/RAG- (35%). For TNM classification, REK+/RAG+ attained 80% accuracy, exceeding REK+/RAG- (55%) and REK-/RAG- (50%). Additionally, REK+/RAG+ explicitly presented retrieved REK excerpts, achieving a retrieval accuracy of 92%. Conclusion: NotebookLM, a RAG-LLM, outperformed its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment, suggesting that RAG may improve LLM's staging accuracy. Furthermore, its ability to retrieve and present REK excerpts provides transparency for physicians, highlighting its applicability for clinical diagnosis and classification.