The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

作者: Zhanqi Zhang, Hua-Dong Xiong, Robert C. Wilson, Mikio Aoi, Marcelo G. Mattar, Li Ji-An

分类: cs.LG, cs.CL

发布日期: 2026-05-08

💡 一句话要点

揭示大模型“位置诅咒”现象：提出PosBench数据集并通过微调提升序列索引能力

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大语言模型 位置感知 代码理解 LoRA微调 序列建模 智能体开发 基准测试

📋 核心要点

核心问题：大模型在处理长序列时虽能定位特定事实，但在短列表中检索末尾项时存在显著的“位置诅咒”，导致后向检索能力严重滞后。
方法要点：研究者构建了PosBench位置感知数据集，通过LoRA微调技术对模型进行针对性训练，旨在弥补模型在序列索引逻辑上的缺失。
实验效果：微调显著提升了模型在位置检索任务上的表现，并成功泛化至PyIndex代码理解基准，证明了该能力可通过后训练得到一定程度的改善。

📝 摘要（中文）

现代大语言模型（LLMs）在“大海捞针”任务中表现出色，却在短列表中检索最后几项时表现不佳，研究者将其称为“位置诅咒”。例如，Claude Opus 4.6在处理两行代码时，常无法识别倒数第二行。为量化此缺陷，研究者评估了两种互补查询：给定位置检索项目，以及给定项目检索位置。实验发现，无论开源还是闭源模型，后向检索能力均显著弱于前向检索。为验证后训练的修复潜力，研究者构建了PosBench数据集。实验表明，LoRA微调虽能提升前向与后向检索性能并泛化至PyIndex代码理解基准，但绝对性能仍未饱和。随着LLM智能体在大型代码库中进行精确索引的需求增加，基于位置的检索能力已成为未来预训练目标与模型设计的关键挑战。

🔬 方法详解

问题定义：论文旨在解决LLM在序列位置感知上的非对称性缺陷。现有模型在处理“从头开始”的索引时表现良好，但在处理“从末尾倒推”或“基于相对位置”的索引时，准确率大幅下降，这种现象在代码理解等需要精确索引的场景中尤为突出。

核心思路：研究者认为模型缺乏对序列结构中“后向”关系的显式建模。通过构建包含多种位置查询模式（如前向/后向偏移、锚点定位）的PosBench数据集，强制模型学习序列的相对位置关系，从而打破模型对序列起始位置的过度依赖。

技术框架：研究采用LoRA（Low-Rank Adaptation）高效微调框架。首先通过PosBench构建训练集，涵盖了从位置到项目（Position-to-Item）和从项目到位置（Item-to-Position）的多种查询组合，随后在预训练模型上进行监督微调。

关键创新：首次系统性地定义并量化了“位置诅咒”现象，并证明了该缺陷并非模型架构的绝对限制，而是可以通过特定任务的后训练进行缓解，为后续模型预训练阶段引入位置感知目标提供了理论依据。

关键设计：PosBench数据集的设计是核心，它通过定义锚点（列表起点、终点或特定项目）和偏移量（前向或后向），构建了结构化的位置查询任务，确保模型能够学习到序列内部的相对拓扑结构，而非仅仅依赖绝对位置编码。

🖼️ 关键图片

📊 实验亮点

实验结果显示，在未微调前，主流模型在后向检索任务上表现极差。通过PosBench进行LoRA微调后，模型在位置检索任务上的准确率有显著提升，且该能力能够泛化至PyIndex代码理解基准。尽管绝对性能尚未完全饱和，但实验证实了针对性训练是解决位置感知缺陷的有效路径。

🎯 应用场景

该研究对AI编程助手（Coding Agents）具有重要价值。在大型代码库维护、自动化重构及复杂逻辑调试中，模型需要精确索引函数定义、变量引用及代码行号。提升模型的位置检索能力，能显著减少智能体在代码导航中的幻觉，提高代码理解的准确性与执行效率。

📄 摘要（原文）

Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this failure the Position Curse. For instance, even in a two-line code snippet, Claude Opus 4.6 misidentifies the second-to-last line most of the time. To characterize this failure, we evaluated two complementary queries: given a position in a sequence (of letters or words), retrieve the corresponding item; and given an item, return its position. Each position is specified as a forward or backward offset from an anchor, either an endpoint of the list (its start or end) or another item in the list. Across both open-source and frontier closed-source models, backward retrieval substantially lags forward retrieval. To test whether this capability can be rescued by post-training, we constructed PosBench, a position-focused training dataset. LoRA fine-tuning improves both forward and backward retrieval and generalizes to a held-out code-understanding benchmark (PyIndex), yet absolute performance remains far from saturated. As LLM coding agents increasingly operate over large codebases where precise indexing becomes essential for code understanding and editing, position-based retrieval emerges as a key capability for future pretraining objectives and model design.

The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理