| 9 |
Holistic Utility Preference Learning for Listwise Alignment |
提出DRPO,通过优化排序偏好解决LLM对齐中的列表级偏好学习问题 |
reinforcement learning preference learning RLHF |
|
|
| 10 |
Anchored Alignment for Self-Explanations Enhancement |
提出锚定对齐方法,提升大语言模型在无标注情况下的自解释能力 |
DPO direct preference optimization large language model |
|
|
| 11 |
Goal Inference from Open-Ended Dialog |
提出一种在线方法,通过对话进行目标推断,提升具身智能体完成用户目标的能力。 |
RLHF large language model |
|
|
| 12 |
Approximating Auction Equilibria with Reinforcement Learning |
提出基于强化学习的拍卖均衡近似方法,解决复杂拍卖场景下的计算难题。 |
reinforcement learning |
|
|
| 13 |
Transformer Guided Coevolution: Improved Team Selection in Multiagent Adversarial Team Games |
提出BERTeam算法,利用Transformer提升多智能体对抗博弈中的团队选择 |
reinforcement learning deep reinforcement learning |
|
|