| 1 |
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models |
FarsEval-PKBETS:一个用于评估波斯语大型语言模型的新型多样化基准 |
large language model |
|
|
| 2 |
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines |
PROMPTEVALS:用于定制化生产大语言模型流水线的断言与护栏数据集 |
large language model |
|
|
| 3 |
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data |
提出TRANS-ZERO,利用自博弈和大型语言模型实现无需平行数据的多语言翻译 |
large language model |
|
|
| 4 |
A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models |
提出基于大语言模型的层级框架HSPIM,用于评估科学论文的创新性。 |
large language model |
✅ |
|
| 5 |
Functional Abstraction of Knowledge Recall in Large Language Models |
通过函数抽象理解LLM知识回忆机制,并改进上下文知识编辑 |
large language model |
|
|
| 6 |
a1: Steep Test-time Scaling Law via Environment Augmented Generation |
提出环境增强生成(EAG)框架,提升LLM在复杂推理任务中的可靠性与准确性。 |
large language model chain-of-thought |
|
|
| 7 |
Causality for Natural Language Processing |
深入探索大型语言模型中的因果推理能力及其在自然语言处理中的应用 |
large language model |
|
|
| 8 |
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation |
BookWorld:构建小说世界交互式智能体社会,用于创造性故事生成。 |
large language model |
✅ |
|
| 9 |
Don't Retrieve, Generate: Prompting LLMs for Synthetic Training Data in Dense Retrieval |
提出利用LLM生成合成负样本训练稠密检索模型,但效果不如传统方法。 |
large language model |
|
|