| 1 |
Imitating Language via Scalable Inverse Reinforcement Learning |
提出基于逆强化学习的语言模型微调方法,提升生成质量和多样性。 |
reinforcement learning imitation learning inverse reinforcement learning |
|
|
| 2 |
Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data |
对比LLM与传统机器学习在COVID-19死亡率预测中的性能,发现传统方法更优 |
predictive model large language model |
|
|
| 3 |
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization |
提出PPO-DAP,通过扩散模型提升PPO在连续控制任务中的样本效率和探索能力 |
reinforcement learning PPO |
|
|
| 4 |
Revisiting Safe Exploration in Safe Reinforcement learning |
提出基于EMCC的SafeRL方法,解决传统方法在安全探索中的风险问题 |
reinforcement learning |
|
|
| 5 |
Real-Time Recurrent Learning using Trace Units in Reinforcement Learning |
提出基于Trace Units的实时循环学习方法,提升强化学习在部分可观测环境中的性能。 |
reinforcement learning |
|
|