Discourse Diversity in Multi-Turn Empathic Dialogue
作者: Hongli Zhan, Emma S. Gueorguieva, Javier Hernandez, Jina Suh, Desmond C. Ong, Junyi Jessy Li
分类: cs.CL, cs.AI
发布日期: 2026-04-13
💡 一句话要点
提出MINT框架以解决多轮同理对话中的话语多样性问题
🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 同理对话 多轮对话 强化学习 话语多样性 大型语言模型 情感计算 自然语言处理
📋 核心要点
- 现有的LLMs在多轮同理对话中表现出话语重复性高,缺乏多样性,影响了对话的有效性。
- 本文提出MINT框架,通过结合同理质量奖励和跨轮策略新颖性信号,优化多轮对话中的话语多样性。
- 实验结果显示,MINT在1.7B和4B模型上提升了同理质量25.3%,同时减少了26.3%的跨轮话语重复,超越了所有基线方法。
📝 摘要(中文)
大型语言模型(LLMs)在单轮对话中能够生成高度同理的回应,但它们在话语结构上往往表现出公式化的特征,导致缺乏多样性。本文探讨了这种公式化是否延续到多轮对话中,发现LLMs在多轮对话中重复使用策略的频率几乎是人类的两倍。为了解决这一问题,本文提出了MINT(多轮策略新颖性训练)框架,通过强化学习优化多轮同理对话中的话语多样性。实验结果表明,MINT显著提高了同理质量,并减少了跨轮对话中的话语重复。
🔬 方法详解
问题定义:本文旨在解决大型语言模型在多轮同理对话中话语重复性高的问题。现有方法在单轮对话中表现良好,但在多轮对话中缺乏策略多样性,导致支持效果下降。
核心思路:论文提出MINT框架,通过强化学习优化话语多样性,结合同理质量和策略新颖性信号,旨在提高对话的有效性和支持质量。
技术框架:MINT框架包括两个主要模块:同理质量奖励模块和跨轮策略新颖性信号模块。前者评估回应的同理程度,后者鼓励模型在多轮对话中使用不同的策略。
关键创新:MINT的核心创新在于首次将同理质量与策略新颖性结合,通过强化学习优化多轮对话中的话语多样性,克服了现有方法的局限性。
关键设计:在MINT中,损失函数设计为同时考虑同理质量和策略新颖性,模型结构采用了强化学习框架,确保在训练过程中能够有效地调整策略选择。
🖼️ 关键图片
📊 实验亮点
实验结果显示,MINT框架在4B模型上提升了同理质量25.3%,同时减少了26.3%的跨轮话语重复,超越了所有基线方法,包括仅考虑质量和基于令牌多样性的方法,显示出显著的性能提升。
🎯 应用场景
该研究的潜在应用领域包括心理健康支持、在线咨询和人机交互等场景。通过提高对话的同理性和多样性,能够更好地满足用户的情感需求,提升用户体验,未来可能对智能客服和情感计算领域产生深远影响。
📄 摘要(原文)
Large language models (LLMs) produce responses rated as highly empathic in single-turn settings (Ayers et al., 2023; Lee et al., 2024), yet they are also known to be formulaic generators that reuse the same lexical patterns, syntactic templates, and discourse structures across tasks (Jiang et al., 2025; Shaib et al., 2024; Namuduri et al., 2025). Less attention has been paid to whether this formulaicity extends to the level of discourse moves, i.e., what a response does for the person it is addressing. This question is especially consequential for empathic dialogue, where effective support demands not just a kind response at one moment but varied strategies as a conversation unfolds (Stiles et al., 1998). Indeed, prior work shows that LLMs reuse the same tactic sequences more than human supporters in single-turn settings (Gueorguieva et al., 2026). We extend this analysis to multi-turn conversations and find that the rigidity compounds: once a tactic appears in a supporter turn, LLMs reuse it in the next at nearly double the rate of humans (0.50-0.56 vs. 0.27). This pattern holds across LLMs serving as supporters in real emotional support conversations, and is invisible to standard similarity metrics. To address this gap, we introduce MINT (Multi-turn Inter-tactic Novelty Training), the first reinforcement learning framework to optimize discourse move diversity across multi-turn empathic dialogue. The best MINT variant combines an empathy quality reward with a cross-turn tactic novelty signal, improving aggregate empathy by 25.3% over vanilla across 1.7B and 4B models while reducing cross-turn discourse move repetition by 26.3% on the 4B model, surpassing all baselines including quality-only and token-level diversity methods on both measures. These results suggest that what current models lack is not empathy itself, but the ability to vary their discourse moves across a conversation.