| 1 |
Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning |
EduAlign框架:利用强化学习提升LLM在教育领域的个性化和创造性 |
reinforcement learning large language model |
|
|
| 2 |
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge |
提出MaPPO框架以优化大语言模型的偏好对齐问题 |
preference learning DPO direct preference optimization |
|
|
| 3 |
Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining |
提出轨迹级奖励塑造方法TLRS,提升公式化Alpha挖掘的效率与预测能力 |
reinforcement learning reward shaping |
|
|
| 4 |
FAST: Similarity-based Knowledge Transfer for Efficient Policy Learning |
FAST:基于相似度的知识迁移,用于高效策略学习 |
policy learning |
|
|
| 5 |
Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic |
提出空间-时间强化学习(STRL)框架,解决非马尔可夫网络流量下的路由问题。 |
reinforcement learning |
|
|