| 1 |
Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard |
提出基于网格游戏竞赛的LLM评估基准,用于评估LLM的规则理解和战略思维能力。 |
large language model |
|
|
| 2 |
On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz Experiments |
利用大语言模型进行WoZ实验:行为识别与实验流程构建 |
large language model |
|
|
| 3 |
Was it Slander? Towards Exact Inversion of Generative Language Models |
提出基于搜索的对抗攻击方法,评估大型语言模型抵抗伪造输出溯源攻击的能力 |
large language model |
|
|
| 4 |
Why should we ever automate moral decision making? |
探讨AI道德决策自动化的必要性与风险,分析其可行性与挑战 |
foundation model |
|
|
| 5 |
Be More Real: Travel Diary Generation Using LLM Agents and Individual Profiles |
MobAgent:利用LLM Agent和个体画像生成更真实的城市旅行日记 |
large language model |
|
|
| 6 |
Rectifier: Code Translation with Corrector via LLMs |
提出 Rectifier,利用微型通用模型修正LLM代码翻译错误,提升迁移质量。 |
large language model |
|
|