| 1 |
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning |
提出HICRA算法,通过强化学习提升LLM的层级推理能力,优化策略规划。 |
reinforcement learning large language model |
|
|
| 2 |
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs |
揭示LLM人格幻觉:自述与行为的解离现象研究 |
RLHF large language model |
|
|
| 3 |
CausalARC: Abstract Reasoning with Causal World Models |
提出CausalARC,用于在低数据和分布偏移下进行因果抽象推理。 |
world model |
|
|
| 4 |
Language Models Do Not Follow Occam's Razor: A Benchmark for Inductive and Abductive Reasoning |
提出InAbHyD基准测试LLM的归纳和溯因推理能力,发现其不遵循奥卡姆剃刀原则 |
world model large language model |
|
|