An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

作者: Roma Shusterman, Allison C. Waters, Shannon O`Neill, Phan Luu, Don M. Tucker

分类: cs.CL

发布日期: 2024-07-23

备注: 25 pages, 4 figures

💡 一句话要点

提出基于主动推理的LLM提示策略，提升医疗场景下LLM响应的可靠性

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 主动推理 医疗应用 提示工程 认知行为治疗 actor-critic框架 可靠性 领域知识

📋 核心要点

现有LLM在医学应用中面临非确定性、潜在错误和缺乏质量控制等问题，限制了其应用。
论文提出基于主动推理的actor-critic LLM提示协议，通过领域知识限制和监督代理评估来提升响应质量。
实验结果表明，LLM生成的响应获得了CBT-I专家的高度评价，甚至优于人工生成的适当响应。

📝 摘要（中文）

大型语言模型(LLM)在医学领域的应用潜力巨大，尤其是在教育、培训、评估和治疗等方面。然而，LLM的非确定性、潜在的错误或有害响应以及缺乏质量控制使其在医学应用中面临挑战。本文提出了一种改进LLM响应的框架，该框架将LLM的知识库限制在包含验证过的医学信息的特定领域数据集内。此外，引入了一种基于主动推理原则的actor-critic LLM提示协议，其中治疗师(Therapist)代理首先响应患者查询，然后监督者(Supervisor)代理评估和调整响应，以确保准确性和可靠性。一项验证研究表明，经验丰富的认知行为失眠治疗(CBT-I)专家对LLM生成的响应给予了高度评价，甚至超过了治疗师生成的适当响应。该结构化方法旨在将先进的LLM技术集成到医疗应用中，满足监管要求，从而安全有效地使用医学专用验证LLM。

🔬 方法详解

问题定义：论文旨在解决大型语言模型（LLM）在医疗实践中应用时，由于其非确定性、可能产生不准确或有害的回复，以及缺乏质量控制而导致可靠性不足的问题。现有方法难以保证LLM在医疗场景下输出信息的准确性和安全性，阻碍了LLM在医疗领域的广泛应用。

核心思路：论文的核心思路是借鉴人类认知中的主动推理原则，构建一个actor-critic框架，其中“治疗师”代理（actor）负责生成初始回复，而“监督者”代理（critic）负责评估和调整回复，以确保其准确性和可靠性。同时，通过限制LLM的知识库到特定领域的验证医学信息，进一步提高回复的质量。

技术框架：该框架包含两个主要模块：治疗师代理和监督者代理。治疗师代理接收患者的查询，并基于LLM生成初始回复。监督者代理评估治疗师代理生成的回复，并根据预定义的规则和标准进行调整，以确保回复的准确性、相关性和安全性。整个过程模拟了人类治疗师在治疗过程中的自我反思和调整，以及同行监督的过程。

关键创新：该论文的关键创新在于将主动推理原则应用于LLM的提示工程，通过actor-critic框架模拟人类认知过程，从而提高LLM在医疗场景下的可靠性。与传统的LLM提示方法相比，该方法能够更好地控制LLM的输出，并确保其符合医疗领域的专业标准和伦理规范。

关键设计：论文的关键设计包括：1) 限制LLM的知识库到特定领域的验证医学信息，例如CBT-I相关的数据集；2) 设计监督者代理的评估和调整规则，使其能够有效地识别和纠正治疗师代理生成的错误或不当回复；3) 使用盲法评估，由经验丰富的CBT-I治疗师评估LLM生成的回复，并与人工生成的适当和不适当回复进行比较。

📊 实验亮点

实验结果显示，经过主动推理框架调整后的LLM回复获得了CBT-I专家的高度评价，其评分甚至超过了人工生成的适当回复。这表明该方法能够显著提高LLM在医疗场景下的可靠性和实用性。具体来说，LLM生成的回复在准确性、相关性和安全性等方面均表现出色。

🎯 应用场景

该研究成果可应用于医疗咨询、患者教育、临床决策支持等领域。通过提供可靠的医学知识和建议，LLM可以帮助患者更好地了解自己的病情，辅助医生进行诊断和治疗，并提高医疗服务的效率和质量。未来，该技术有望推广到其他医疗领域，为更多患者和医生带来福祉。

📄 摘要（原文）

Continuing advances in Large Language Models (LLMs) in artificial intelligence offer important capacities in intuitively accessing and using medical knowledge in many contexts, including education and training as well as assessment and treatment. Most of the initial literature on LLMs in medicine has emphasized that LLMs are unsuitable for medical use because they are non-deterministic, may provide incorrect or harmful responses, and cannot be regulated to assure quality control. If these issues could be corrected, optimizing LLM technology could benefit patients and physicians by providing affordable, point-of-care medical knowledge. Our proposed framework refines LLM responses by restricting their primary knowledge base to domain-specific datasets containing validated medical information. Additionally, we introduce an actor-critic LLM prompting protocol based on active inference principles of human cognition, where a Therapist agent initially responds to patient queries, and a Supervisor agent evaluates and adjusts responses to ensure accuracy and reliability. We conducted a validation study where expert cognitive behaviour therapy for insomnia (CBT-I) therapists evaluated responses from the LLM in a blind format. Experienced human CBT-I therapists assessed responses to 100 patient queries, comparing LLM-generated responses with appropriate and inappropriate responses crafted by experienced CBT-I therapists. Results showed that LLM responses received high ratings from the CBT-I therapists, often exceeding those of therapist-generated appropriate responses. This structured approach aims to integrate advanced LLM technology into medical applications, meeting regulatory requirements for establishing the safe and effective use of special purpose validated LLMs in medicine.

An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理