| 1 |
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback |
提出LoCo-RLHF框架,利用低秩上下文信息解决异构人类反馈中的奖励学习问题。 |
reinforcement learning offline reinforcement learning RLHF |
|
|
| 2 |
Numerical solutions of fixed points in two-dimensional Kuramoto-Sivashinsky equation expedited by reinforcement learning |
提出基于强化学习优化的JFNK方法,加速求解二维Kuramoto-Sivashinsky方程的定点 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe |
提出随机化Frank-Wolfe算法以优化人类偏好学习 |
reinforcement learning preference learning |
|
|
| 4 |
Enhancing Adversarial Robustness of Deep Neural Networks Through Supervised Contrastive Learning |
结合监督对比学习与Margin损失,提升深度神经网络的对抗鲁棒性 |
contrastive learning |
|
|
| 5 |
Minimax-Optimal Multi-Agent Robust Reinforcement Learning |
提出Q-FTRL算法扩展至RMGs,实现minimax最优的多智能体鲁棒强化学习 |
reinforcement learning |
|
|
| 6 |
Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization |
提出基于图注意力的因果发现方法,通过信任域引导的裁剪策略优化提升性能。 |
reinforcement learning PPO |
|
|