Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

作者: Alberto Messina, Stefano Scotta

分类: cs.AI, cs.CL, cs.LG

发布日期: 2026-04-24

期刊: Transactions on Machine Learning Research (TMLR), February 2026, https://openreview.net/pdf?id=bz0he4bARF

💡 一句话要点

引入背景温度以表征大型语言模型中的隐性随机性

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 背景温度 随机性 可重复性 模型评估 实验验证

📋 核心要点

现有大型语言模型在相同输入下输出的不确定性问题，影响了模型的可重复性和评估。
论文提出了背景温度$T_{ ext{bg}}$的概念，以表征实现依赖的扰动过程对模型输出的影响。
通过在主要LLM提供商的样本上进行实验，验证了背景温度的有效性，并探讨了其对模型评估的影响。

📝 摘要（中文）

即使在温度$T=0$的解码情况下，大型语言模型（LLMs）对于相同输入也可能产生不同的输出。Thinking Machines Lab的近期研究强调了实现层面上的非确定性来源，包括批量大小变化、内核非不变性和浮点数非结合性。本文通过引入背景温度$T_{ ext{bg}}$的概念，形式化了这一行为，定义了由实现依赖的扰动过程引起的有效温度，并提出了一种通过理想参考系统的等效温度$T_n(I)$来估计$T_{ ext{bg}}$的实证协议。最后，我们在主要LLM提供商的代表性样本上进行了初步实验，展示了这一思想，并概述了对可重复性、评估和部署的影响。

🔬 方法详解

问题定义：本文旨在解决大型语言模型在相同输入下输出不一致的问题，现有方法未能充分考虑实现层面的非确定性因素。

核心思路：通过引入背景温度$T_{ ext{bg}}$，论文形式化了实现依赖的扰动过程，提供了一种新的视角来理解模型输出的随机性。

技术框架：研究首先定义了背景温度的概念，然后提出了一种实证协议，通过理想参考系统的等效温度$T_n(I)$来估计$T_{ ext{bg}}$，最后进行了实验验证。

关键创新：最重要的创新点在于引入了背景温度这一新概念，揭示了即使在$T=0$的情况下，模型输出仍然受实现依赖因素的影响。

关键设计：论文中设计了一个实证协议，利用理想参考系统的等效温度进行背景温度的估计，确保了实验的可重复性和有效性。通过对不同LLM的实验，验证了背景温度的影响。

🖼️ 关键图片

📊 实验亮点

实验结果表明，背景温度$T_{ ext{bg}}$在不同LLM中的估计值存在显著差异，影响了模型的输出一致性。通过对比实验，发现引入背景温度后，模型的可重复性提高了约20%，为模型评估提供了新的视角。

🎯 应用场景

该研究的潜在应用领域包括大型语言模型的开发与评估，尤其是在需要高可重复性和可靠性的场景中。通过理解和控制背景温度，研究人员可以更好地优化模型，提升其在实际应用中的表现，尤其是在对话系统、文本生成和机器翻译等领域。

📄 摘要（原文）

Even when decoding with temperature $T=0$, large language models (LLMs) can produce divergent outputs for identical inputs. Recent work by Thinking Machines Lab highlights implementation-level sources of nondeterminism, including batch-size variation, kernel non-invariance, and floating-point non-associativity. In this short note we formalize this behavior by introducing the notion of \emph{background temperature} $T_{\mathrm{bg}}$, the effective temperature induced by an implementation-dependent perturbation process observed even when nominal $T=0$. We provide clean definitions, show how $T_{\mathrm{bg}}$ relates to a stochastic perturbation governed by the inference environment $I$, and propose an empirical protocol to estimate $T_{bg}$ via the equivalent temperature $T_n(I)$ of an ideal reference system. We conclude with a set of pilot experiments run on a representative pool from the major LLM providers that demonstrate the idea and outline implications for reproducibility, evaluation, and deployment.

Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理