SeDT: Sentence-Transformer Decision-Transformer Conditioning for Multi-Turn Conversation Reliability

作者: Ramakrishna Vamsi Setti, Jagadeesh Rachapudi, Sachin Chaudhary, Praful Hambarde, Amit Shukla

分类: cs.CL, cs.AI

发布日期: 2026-05-26

💡 一句话要点

提出SeDT，通过句子Transformer决策Transformer条件反射提升多轮对话可靠性

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 多轮对话 对话迷失 大型语言模型 句子Transformer 决策Transformer 条件反射 离线强化学习

📋 核心要点

现有大型语言模型在多轮对话中存在“对话迷失”问题，可靠性显著下降。
SeDT通过引入return-to-go条件反射，利用句子Transformer为对话历史进行相关性加权。
实验表明，SeDT在多个LLM和生成任务上显著提升了性能，并降低了不可靠性。

📝 摘要（中文）

大型语言模型(LLMs)在单轮对话中表现出色，但当相同任务在多轮对话中逐步揭示时，性能会下降高达39%，这种现象被称为“对话迷失”。这种下降主要是可靠性问题；模型能力仅下降16%，而不可靠性增加超过一倍(+112%)。论文认为根本原因是结构性的，扁平的对话历史对每个先前的轮次赋予相同的隐式权重，模型无法区分关键约束和偶然对话。论文提出SeDT，一种无需训练的推理时方法，通过引入离线强化学习中的return-to-go条件反射来解决这个问题。SeDT使用语义、词汇和位置信号为每个对话片段注释累积相关性得分，并在最后一轮将完整的注释历史呈现给模型，无需权重更改、训练数据或丢弃上下文。在三个LLM和三个生成任务的Lost-in-Conversation基准上评估，SeDT在所有九个模型-任务组合中优于分片基线，平均性能P提升高达+37.7%，并在九个组合中的七个组合中同时降低了不可靠性。简而言之，告诉模型哪些过去的轮次重要足以显著恢复对话中损失的性能。

🔬 方法详解

问题定义：论文旨在解决大型语言模型在多轮对话中出现的“对话迷失”问题，即模型在多轮对话中逐步接收任务信息时，性能会显著下降。现有方法将对话历史视为扁平结构，平等对待每一轮对话，无法区分关键信息和无关信息，导致模型难以保持对话的连贯性和可靠性。

核心思路：论文的核心思路是借鉴离线强化学习中的return-to-go条件反射，为对话历史中的每一轮对话赋予不同的权重，突出关键信息。通过这种方式，模型可以更好地理解对话的上下文，并做出更准确的决策。SeDT的关键在于如何确定每一轮对话的相关性得分，并将其有效地融入到模型的推理过程中。

技术框架：SeDT是一个无需训练的推理时方法，其整体框架如下：1. 对话历史注释：使用句子Transformer为每一轮对话计算语义、词汇和位置信号，并将其组合成一个累积相关性得分。2. 带注释的对话历史呈现：在最后一轮对话时，将完整的带注释的对话历史呈现给模型。3. 模型推理：模型根据带注释的对话历史进行推理，生成最终的回复。

关键创新：SeDT的关键创新在于：1. 无需训练：SeDT无需额外的训练数据或权重更改，可以直接应用于现有的LLM。2. return-to-go条件反射：借鉴离线强化学习的思想，为对话历史中的每一轮对话赋予不同的权重，突出关键信息。3. 多信号融合：结合语义、词汇和位置信号，更准确地评估每一轮对话的相关性。

关键设计：SeDT的关键设计包括：1. 句子Transformer：使用预训练的句子Transformer模型（如Sentence-BERT）提取对话的语义信息。2. 相关性得分计算：将语义、词汇和位置信号进行加权组合，得到每一轮对话的相关性得分。具体权重设置未知。3. 注释方式：将相关性得分以某种方式（具体方式未知）添加到对话历史中，以便模型可以利用这些信息进行推理。

🖼️ 关键图片

📊 实验亮点

SeDT在Lost-in-Conversation基准测试中表现出色，在所有九个模型-任务组合中均优于分片基线。平均性能P提升高达+37.7%，并在九个组合中的七个组合中同时降低了不可靠性。例如，在某个具体任务上，SeDT将性能从X%提升到Y%（具体数据未知），显著降低了“对话迷失”现象。

🎯 应用场景

SeDT可应用于各种需要多轮对话的场景，例如智能客服、虚拟助手、任务型对话系统等。通过提高多轮对话的可靠性，SeDT可以提升用户体验，并使LLM在更复杂的对话任务中发挥更大的作用。该方法无需训练，易于部署，具有广泛的应用前景。

📄 摘要（原文）

Large language models (LLMs) achieve impressive performance when a task is fully specified in a single turn, yet the same models lose up to 39% of that performance when the identical task is revealed incrementally across multiple turns, a phenomenon documented at scale as Lost in Conversation. Crucially, this collapse is almost entirely a reliability failure; the best case, the aptitude only falls 16%, while the unreliability more than doubles (+112%). We argue that the root cause is structural, a flat conversation history assigns equal implicit weight to every prior turn, giving the model no signal to distinguish a critical constraint from incidental dialog. We present SeDT Sentence-transformer Decision-Transformer, a training-free inference-time method that resolves this by importing return-to-go conditioning from offline reinforcement learning. SeDT annotates each conversation shard with a cumulative relevance score derived from three complementary semantic, lexical, and positional signals and presents the full annotated history to the model at the final turn, without weight changes, without training data, and without discarding context. Evaluated on the Lost-in-Conversation benchmark in three LLMs and three generation tasks, SeDT outperforms the sharded baseline in all nine model-task combinations, with gains up to +37.7% in mean performance P and simultaneous reductions in unreliability in seven of the nine combinations. In short, telling the model which past turns matter is sufficient to substantially recover the performance lost in conversation.

SeDT: Sentence-Transformer Decision-Transformer Conditioning for Multi-Turn Conversation Reliability

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理