| 10 |
Training Large Language Models to Reason via EM Policy Gradient |
提出EM策略梯度算法,提升LLM在复杂推理任务中的性能与可解释性 |
reinforcement learning PPO large language model |
|
|
| 11 |
Cooperative Task Offloading through Asynchronous Deep Reinforcement Learning in Mobile Edge Computing for Future Networks |
提出基于异步深度强化学习的协同任务卸载框架CTO-TP,优化未来网络MEC中的延迟和能耗。 |
reinforcement learning deep reinforcement learning |
|
|
| 12 |
Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning |
Plasticine:加速塑性驱动的深度强化学习研究的开源框架 |
reinforcement learning deep reinforcement learning |
✅ |
|
| 13 |
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning |
RAGEN:通过多轮强化学习理解LLM Agent的自我进化 |
reinforcement learning large language model |
✅ |
|
| 14 |
CaRL: Learning Scalable Planning Policies with Simple Rewards |
CaRL:通过简单奖励学习可扩展的规划策略,应用于自动驾驶。 |
reinforcement learning PPO imitation learning |
|
|
| 15 |
ExOSITO: Explainable Off-Policy Learning with Side Information for Intensive Care Unit Blood Test Orders |
ExOSITO:结合辅助信息的ICU血检医嘱可解释离线策略学习 |
policy learning privileged information |
|
|
| 16 |
Do We Need Transformers to Play FPS Video Games? |
在第一人称射击游戏中使用Transformer不如传统方法 |
reinforcement learning offline reinforcement learning decision transformer |
|
|
| 17 |
Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization |
提出TCTO框架,通过图驱动路径优化实现自动化特征工程,提升下游任务性能。 |
reinforcement learning |
|
|