From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks

作者: Neh Majmudar, Anne Huang, Jinfan Frank Hu, Elena Filatova

分类: cs.CL

发布日期: 2026-05-13

备注: Proceedings of the Fifteenth Language Resources and Evaluation Conference

DOI: 10.63317/3yq3khhygtya

💡 一句话要点

提出系统化方法将Rosetta Stone难题转换为Match-Up格式

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 语言难题 Rosetta Stone Match-Up 数据集 语言推理 教育技术 人工智能

📋 核心要点

现有的语言难题创建方法复杂且耗时，缺乏高效的生成机制。
提出了一种系统化的转换程序，将Rosetta Stone难题高效转换为Match-Up格式。
实验结果表明，专家和LLMs在Match-Up难题上表现出全有或全无的解答模式，提供了新的数据集和难度评估。

📝 摘要（中文）

本文研究了在高中语言学竞赛中使用的语言难题，重点关注两种常见格式：Rosetta Stone和Match-Up。我们提出了一种系统化的程序，将现有的Rosetta Stone难题转换为相应的Match-Up难题。由于语言难题的创建复杂且耗时，我们的方法提供了一种高效的方式来加速新难题的生成。通过对Rosetta Stone-Match-Up对的评估，结果显示专家人类解答者和大型语言模型（LLMs）在Match-Up难题上表现出全有或全无的模式，完全解决或完全失败。这项工作贡献了一个新的配对难题数据集，并提供了不同格式难题难度的详细评估，为人类和机器的语言推理提供了见解。

🔬 方法详解

问题定义：本文旨在解决现有语言难题创建过程中的复杂性和耗时问题。传统的Rosetta Stone难题生成方法效率低下，限制了新难题的产生。

核心思路：我们提出了一种系统化的程序，将现有的Rosetta Stone难题转换为Match-Up格式，以简化生成过程并提高效率。通过这种转换，可以快速生成新的语言难题，满足竞赛需求。

技术框架：整体流程包括难题的识别、转换规则的应用和新难题的生成。首先，识别现有的Rosetta Stone难题，然后根据预设的转换规则生成Match-Up格式的难题，最后进行难题的评估与验证。

关键创新：最重要的创新在于提出了一种高效的转换机制，使得难题生成过程更加系统化和标准化。与传统方法相比，我们的方法显著提高了生成效率。

关键设计：在设计过程中，设置了多种转换规则，以确保生成的Match-Up难题在难度和结构上与原始Rosetta Stone难题相匹配。此外，采用了评估标准来验证新生成难题的有效性和难度。

📊 实验亮点

实验结果显示，专家人类解答者和大型语言模型在Match-Up难题上表现出全有或全无的解答模式，表明难题的设计有效性。通过这种方法生成的难题在难度评估上提供了新的见解，推动了语言推理研究的发展。

🎯 应用场景

该研究的潜在应用领域包括教育、语言学竞赛和人工智能训练等。通过提供高效的难题生成方法，可以帮助教师和教育工作者快速创建新的语言难题，促进学生的语言能力发展。同时，该研究也为大型语言模型的训练提供了新的数据集，推动了机器语言理解的研究进展。

📄 摘要（原文）

In this paper, we examine linguistic puzzles used in high school linguistics competitions, focusing on two common formats: Rosetta Stone and Match-Up. We propose a systematic procedure for converting existing Rosetta Stone puzzles into corresponding Match-Up counterparts. Because linguistic puzzle creation is complex and time-consuming, our method provides an efficient way to accelerate the generation of new puzzles. We evaluate the resulting Rosetta Stone-Match-Up pairs with both human participants and large language models (LLMs). Our results show that both expert human solvers and LLMs display an all-or-nothing pattern on Match-Up puzzles, either solving them completely or failing entirely. This work contributes a new dataset of paired puzzles and provides a detailed evaluation of puzzle difficulty across formats, offering insights into both human and machine linguistic reasoning.

From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理