PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers
作者: Boning Li, Baoxiang Wang, Longbo Huang
分类: cs.AI, cs.GT
发布日期: 2026-05-28
备注: 45 pages, 3 figures
🔗 代码/项目: GITHUB
💡 一句话要点
提出PokerSkill框架以解决无训练扑克游戏问题
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 扑克游戏 大型语言模型 无训练代理 规则基础技能 人工智能
📋 核心要点
- 现有扑克AI方法依赖于平衡求解器,训练成本高且效率低,难以实现实时对战。
- PokerSkill框架通过结合规则基础扑克技能与LLMs,提供了一种无需训练的高效扑克游戏策略。
- 实验表明,使用PokerSkill的LLMs在扑克游戏中表现优异,显著降低了损失,超越了传统强代理。
📝 摘要(中文)
扑克是人工智能的一个重要挑战,传统方法依赖于平衡求解器,需耗费大量计算资源。大型语言模型(LLMs)虽然具备丰富的扑克知识,但直接游戏表现远不如基于求解器的代理。本文提出PokerSkill框架,通过将详细的规则基础扑克技能作为LLMs的结构化行动基础接口,成功弥补了这一差距。实验结果显示,使用PokerSkill的GPT-5.5 XHigh在与GTOWizard的对抗中,损失减少了49-61%,展现出无需训练或求解器访问的竞争能力。
🔬 方法详解
问题定义:本论文旨在解决大型语言模型在扑克游戏中的表现不足问题。现有方法通常依赖于耗时的平衡求解器,导致训练成本高且实时性差。
核心思路:PokerSkill框架通过将人类扑克专家设计的规则基础技能作为结构化接口,限制LLMs的选择范围,从而提高其决策质量。
技术框架:该框架包括一个确定性上下文引擎,分析当前游戏状态并从分层技能库中检索相关技能片段,确保LLMs的行动合理。
关键创新:本研究的创新在于首次将规则基础技能与LLMs结合,形成无需训练或求解器的扑克代理,显著提升了游戏表现。
关键设计:框架设计中,技能库由扑克专家构建,确保技能的有效性和适用性;上下文引擎的设计使得LLMs能够在复杂的游戏状态下做出合理决策。
🖼️ 关键图片
📊 实验亮点
在与GTOWizard的对抗中,使用PokerSkill的GPT-5.5 XHigh实现了平均损失为-57±21 mbb/手,相比于默认提示基线减少了49-61%的损失,表现超越了传统强代理Slumbot,展示了其竞争力。
🎯 应用场景
PokerSkill框架的潜在应用领域包括在线扑克游戏、智能游戏代理和教育培训等。其无需训练的特性使得在资源有限的情况下也能实现高效的扑克决策,具有广泛的实际价值和未来影响。
📄 摘要(原文)
Poker is a landmark challenge for artificial intelligence. The dominant approach relies on equilibrium solvers built on counterfactual regret minimization, requiring millions of core-hours of training. Large Language Models (LLMs) possess extensive poker knowledge but perform far below solver-based agents when asked to play directly. Traditional rule-based poker agents are interpretable and training-free, but their strategic ceiling remains far below equilibrium play. We introduce \textbf{PokerSkill}, a training-free and solver-free framework that bridges this gap by using detailed rule-based poker skills as a structured action-grounding interface for LLMs. A deterministic context engine analyzes the current state and retrieves only the relevant fragments from a layered skill library, which is entirely designed by human poker experts, constraining the LLM's choice to reasonable actions. Against GTOWizard, a state-of-the-art GTO benchmark, GPT-5.5 XHigh with PokerSkill achieves $-57 \pm 21$ mbb/hand, Claude Opus 4.6 achieves $-80 \pm 29$ mbb/hand and Claude Opus 4.7 achieves $-87\pm 64$ mbb/hand, reducing losses by 49--61\% compared to default-prompt baselines and outperforming the strong bot Slumbot. Our key finding is that rule-based skills alone do not constitute a strong strategy, and LLMs alone cannot play well, but their combination yields an agent that requires neither training nor solver access yet competes with systems built on millions of core-hours of computation. To our knowledge, this is the first demonstration of an LLM achieving competitive performance in a complex imperfect-information game without game-specific training or solver queries. Code is available at https://github.com/lbn187/PokerSkill.