| 1 |
Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework |
提出LMGT框架以解决稀疏奖励下的强化学习问题 |
reinforcement learning large language model |
|
|
| 2 |
Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn |
提出CHAIN方法,通过减少价值和策略的链式漂移来提升深度强化学习性能 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions |
针对线性可实现值函数的MDP,提出样本和Oracle高效的强化学习算法 |
reinforcement learning |
|
|