| 1 |
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles |
提出UP-RLHF,通过不确定性惩罚缓解RLHF中的过优化问题 |
reinforcement learning RLHF large language model |
|
|
| 2 |
Causal State Distillation for Explainable Reinforcement Learning |
提出因果状态蒸馏方法,提升强化学习决策解释性,解决奖励分解方法的局限性。 |
reinforcement learning distillation |
|
|
| 3 |
Laboratory Experiments of Model-based Reinforcement Learning for Adaptive Optics Control |
提出基于模型强化学习的自适应光学控制方法,并在实验室环境中验证其性能。 |
reinforcement learning |
|
|
| 4 |
A Novel Reinforcement Learning Routing Algorithm for Congestion Control in Complex Networks |
提出一种基于强化学习的路由算法,用于复杂网络拥塞控制。 |
reinforcement learning |
|
|
| 5 |
Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations |
提出POSG算法,利用状态演示提升稀疏奖励强化学习策略优化 |
reinforcement learning deep reinforcement learning DRL |
|
|