Discourse Diversity in Multi-Turn Empathic Dialogue

作者: Hongli Zhan, Emma S. Gueorguieva, Javier Hernandez, Jina Suh, Desmond C. Ong, Junyi Jessy Li

分类: cs.CL, cs.AI

发布日期: 2026-04-13

💡 一句话要点

提出MINT框架以解决多轮同理对话中的话语多样性问题

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 同理对话 多轮对话 强化学习 话语多样性 大型语言模型 情感计算 自然语言处理

📋 核心要点

现有的LLMs在多轮同理对话中表现出话语重复性高，缺乏多样性，影响了对话的有效性。
本文提出MINT框架，通过结合同理质量奖励和跨轮策略新颖性信号，优化多轮对话中的话语多样性。
实验结果显示，MINT在1.7B和4B模型上提升了同理质量25.3%，同时减少了26.3%的跨轮话语重复，超越了所有基线方法。

📝 摘要（中文）

大型语言模型（LLMs）在单轮对话中能够生成高度同理的回应，但它们在话语结构上往往表现出公式化的特征，导致缺乏多样性。本文探讨了这种公式化是否延续到多轮对话中，发现LLMs在多轮对话中重复使用策略的频率几乎是人类的两倍。为了解决这一问题，本文提出了MINT（多轮策略新颖性训练）框架，通过强化学习优化多轮同理对话中的话语多样性。实验结果表明，MINT显著提高了同理质量，并减少了跨轮对话中的话语重复。

🔬 方法详解

问题定义：本文旨在解决大型语言模型在多轮同理对话中话语重复性高的问题。现有方法在单轮对话中表现良好，但在多轮对话中缺乏策略多样性，导致支持效果下降。

核心思路：论文提出MINT框架，通过强化学习优化话语多样性，结合同理质量和策略新颖性信号，旨在提高对话的有效性和支持质量。

技术框架：MINT框架包括两个主要模块：同理质量奖励模块和跨轮策略新颖性信号模块。前者评估回应的同理程度，后者鼓励模型在多轮对话中使用不同的策略。

关键创新：MINT的核心创新在于首次将同理质量与策略新颖性结合，通过强化学习优化多轮对话中的话语多样性，克服了现有方法的局限性。

关键设计：在MINT中，损失函数设计为同时考虑同理质量和策略新颖性，模型结构采用了强化学习框架，确保在训练过程中能够有效地调整策略选择。

🖼️ 关键图片

📊 实验亮点

实验结果显示，MINT框架在4B模型上提升了同理质量25.3%，同时减少了26.3%的跨轮话语重复，超越了所有基线方法，包括仅考虑质量和基于令牌多样性的方法，显示出显著的性能提升。

🎯 应用场景

该研究的潜在应用领域包括心理健康支持、在线咨询和人机交互等场景。通过提高对话的同理性和多样性，能够更好地满足用户的情感需求，提升用户体验，未来可能对智能客服和情感计算领域产生深远影响。

📄 摘要（原文）

Large language models (LLMs) produce responses rated as highly empathic in single-turn settings (Ayers et al., 2023; Lee et al., 2024), yet they are also known to be formulaic generators that reuse the same lexical patterns, syntactic templates, and discourse structures across tasks (Jiang et al., 2025; Shaib et al., 2024; Namuduri et al., 2025). Less attention has been paid to whether this formulaicity extends to the level of discourse moves, i.e., what a response does for the person it is addressing. This question is especially consequential for empathic dialogue, where effective support demands not just a kind response at one moment but varied strategies as a conversation unfolds (Stiles et al., 1998). Indeed, prior work shows that LLMs reuse the same tactic sequences more than human supporters in single-turn settings (Gueorguieva et al., 2026). We extend this analysis to multi-turn conversations and find that the rigidity compounds: once a tactic appears in a supporter turn, LLMs reuse it in the next at nearly double the rate of humans (0.50-0.56 vs. 0.27). This pattern holds across LLMs serving as supporters in real emotional support conversations, and is invisible to standard similarity metrics. To address this gap, we introduce MINT (Multi-turn Inter-tactic Novelty Training), the first reinforcement learning framework to optimize discourse move diversity across multi-turn empathic dialogue. The best MINT variant combines an empathy quality reward with a cross-turn tactic novelty signal, improving aggregate empathy by 25.3% over vanilla across 1.7B and 4B models while reducing cross-turn discourse move repetition by 26.3% on the 4B model, surpassing all baselines including quality-only and token-level diversity methods on both measures. These results suggest that what current models lack is not empathy itself, but the ability to vary their discourse moves across a conversation.

Discourse Diversity in Multi-Turn Empathic Dialogue

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理