Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models

作者: Marshall Brett

分类: cs.LG, cs.CL

发布日期: 2026-04-08

备注: 20 pages

💡 一句话要点

研究大型语言模型潜在语义空间中的Voronoi tessellation几何特性，提出margin refinement procedures优化模型。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 Voronoi tessellation Margin优化 几何特性 Fisher信息距离 鲁棒性 表征学习

📋 核心要点

大型语言模型在离散token上运行，但在连续向量空间中计算，导致表征空间上出现Voronoi tessellation，理解其几何特性至关重要。
论文提出margin refinement procedures (MRP)，通过优化token决策margin来重塑Voronoi tessellation，无需重新训练模型。
实验表明，Fisher信息距离最大化的MRP方法能在保持下游任务性能的同时，有效提升token决策margin，但收益集中在高频token。

📝 摘要（中文）

本文研究了大型语言模型（Qwen3.5-4B-Base）表征空间上的Voronoi tessellation几何特性。首先，通过float32 margin重计算解决了bfloat16量化伪影，验证了Mabrok (2026) 的线性缩放定律，R^2 = 0.9997，并识别出一个中间层的几何模糊区域，该区域的margin几何与交叉熵反相关（第24-28层，ρ = -0.29），然后在最后一层结晶成对齐（ρ = 0.836）。其次，证明了已收敛模型的Voronoi tessellation可以通过margin refinement procedures (MRP)进行重塑：即在不重新训练的情况下，通过短期的后验优化来扩大token决策margin。比较了直接margin最大化和Fisher信息距离最大化，发现两种方法都能纠正约16,300个/256K位置，但在附带损害方面存在关键差异。Margin最大化的损害随干预强度而升级，直到修正被淹没。Fisher损害在验证范围内保持不变（λ = 0.15-0.6），在λ = 0.6时实现了+28%的中值margin改进，同时保持下游基准不变。收益集中在高频结构token中（λ = 0.6时净修正的84%），而内容和实体类贡献在高λ时缩小。Fisher MRP是一种可行的几何抛光工具，其上限不是由总损害决定，而是由token级别收益的均匀性决定。

🔬 方法详解

问题定义：现有大型语言模型虽然在各种任务上表现出色，但对其内部表征空间的几何特性，特别是Voronoi tessellation的理解还不够深入。现有方法难以在不重新训练模型的情况下，有效调整模型的决策边界，提升模型的鲁棒性和泛化能力。

核心思路：论文的核心思路是通过margin refinement procedures (MRP)来优化模型的Voronoi tessellation。MRP旨在通过调整token决策margin，在不重新训练整个模型的情况下，改善模型的几何特性。通过扩大token决策margin，可以提高模型对输入扰动的抵抗能力，从而提升模型的鲁棒性。

技术框架：MRP包含两个主要步骤：首先，计算模型在验证集上的token决策margin。然后，使用优化算法（如直接margin最大化或Fisher信息距离最大化）来调整模型的参数，以扩大token决策margin。优化过程通常采用短期的后验优化，以避免过度拟合验证集。

关键创新：论文的关键创新在于提出了MRP这一概念，并探索了两种不同的优化目标：直接margin最大化和Fisher信息距离最大化。与传统的模型训练方法不同，MRP不需要重新训练整个模型，而是通过微调模型的参数来改善其几何特性。Fisher信息距离最大化是一种更稳健的优化目标，可以避免直接margin最大化可能导致的附带损害。

关键设计：论文中，margin定义为token向量与其最近邻token向量之间的距离。直接margin最大化通过梯度上升来调整模型参数，以增大margin。Fisher信息距离最大化则利用Fisher信息矩阵来约束参数更新，以减小对模型其他部分的影响。实验中，作者探索了不同的优化强度（λ），并评估了MRP对模型性能和token级别收益的影响。

📊 实验亮点

实验结果表明，Fisher信息距离最大化的MRP方法可以在λ = 0.6时实现+28%的中值margin改进，同时保持下游基准不变。虽然两种MRP方法都能纠正约16,300个/256K位置，但Fisher MRP的附带损害更小，且收益集中在高频结构token中。该研究验证了Mabrok (2026) 的线性缩放定律，R^2 = 0.9997。

🎯 应用场景

该研究成果可应用于提升大型语言模型的鲁棒性和泛化能力，尤其是在对抗攻击和噪声数据环境下。通过优化模型的决策边界，可以提高模型对恶意输入的抵抗能力，并改善模型在真实世界场景中的表现。此外，该方法还可以用于模型压缩和知识蒸馏，通过调整模型的几何特性来减小模型大小，同时保持模型性能。

📄 摘要（原文）

Language models operate on discrete tokens but compute in continuous vector spaces, inducing a Voronoi tessellation over the representation manifold. We study this tessellation empirically on Qwen3.5-4B-Base, making two contributions. First, using float32 margin recomputation to resolve bfloat16 quantization artifacts, we validate Mabrok's (2026) linear scaling law of the expressibility gap with $R^2$ = 0.9997 - the strongest confirmation to date - and identify a mid-layer geometric ambiguity regime where margin geometry is anti-correlated with cross-entropy (layers 24-28, $ρ$ = -0.29) before crystallizing into alignment at the final layer ($ρ$ = 0.836). Second, we show that the Voronoi tessellation of a converged model is reshapable through margin refinement procedures (MRP): short post-hoc optimization runs that widen token-decision margins without retraining. We compare direct margin maximization against Fisher information distance maximization across a dose-response sweep. Both methods find the same ceiling of ~16,300 correctable positions per 256K evaluated, but differ critically in collateral damage. Margin maximization damage escalates with intervention strength until corrections are overwhelmed. Fisher damage remains constant at ~5,300 positions across the validated range ($λ$ = 0.15-0.6), achieving +28% median margin improvement at $λ$ = 0.6 with invariant downstream benchmarks - a geometric reorganization that compresses the expressibility gap while preserving its scaling law. However, frequency and token-class audits reveal that gains concentrate in high-frequency structural tokens (84% of net corrections at $λ$ = 0.6), with content and entity-like contributions shrinking at higher $λ$. Fisher MRP is therefore a viable geometric polishing tool whose practical ceiling is set not by aggregate damage but by the uniformity of token-level benefit.

Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理