| 1 |
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning |
提出Token Hidden Reward,用于在群体相对深度强化学习中引导探索-利用。 |
reinforcement learning deep reinforcement learning large language model |
|
|
| 2 |
Deep Reinforcement Learning for Multi-Agent Coordination |
提出基于虚拟信息素的S-MADRL框架,解决拥挤环境中多智能体高效协作问题 |
reinforcement learning deep reinforcement learning curriculum learning |
|
|
| 3 |
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration |
提出RAPO算法,通过强化学习探索提升LLM在复杂推理任务中的能力 |
reinforcement learning large language model |
|
|
| 4 |
HOFLON: Hybrid Offline Learning and Online Optimization for Process Start-Up and Grade-Transition Control |
提出HOFLON,结合离线学习与在线优化,提升流程启动和产品切换控制性能。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 5 |
Distributed Area Coverage with High Altitude Balloons Using Multi-Agent Reinforcement Learning |
提出基于多智能体强化学习的高空气球分布式区域覆盖方法 |
reinforcement learning |
|
|