| 1 |
Studying the Korean Word-Chain Game with RLVR: Mitigating Reward Conflicts via Curriculum Learning |
利用课程学习缓解奖励冲突,RLVR方法求解韩语词语接龙游戏 |
reinforcement learning curriculum learning large language model |
|
|
| 2 |
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward |
提出低概率正则化(Lp-Reg)方法,解决RLVR中探索性token消失问题,提升复杂推理任务性能。 |
reinforcement learning large language model |
✅ |
|
| 3 |
Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models |
提出Certifiable Safe-RLHF,通过固定惩罚优化提升语言模型安全性。 |
RLHF large language model |
|
|
| 4 |
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning |
提出多智能体强化学习以解决长时间河流羽流映射问题 |
reinforcement learning spatiotemporal |
|
|
| 5 |
Longitudinal Flow Matching for Trajectory Modeling |
提出插值多边际流匹配(IMMFM)用于解决轨迹建模中稀疏采样和高维问题 |
flow matching |
|
|