| 1 |
Reinforcement Learning with Conditional Expectation Reward |
提出条件期望奖励(CER),利用大语言模型自身作为隐式验证器,提升通用推理能力。 |
reinforcement learning large language model |
✅ |
|
| 2 |
Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control |
提出RAD框架,通过随机优势控制RLHF中的风险,提升安全性和鲁棒性。 |
reinforcement learning RLHF |
|
|
| 3 |
Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning |
提出GR$^3$,通过群组相对奖励重缩放解决强化学习中的长度膨胀问题 |
reinforcement learning RLHF |
|
|
| 4 |
UAV-MARL: Multi-Agent Reinforcement Learning for Time-Critical and Dynamic Medical Supply Delivery |
提出基于多智能体强化学习的无人机医疗物资动态配送方案 |
reinforcement learning PPO |
|
|
| 5 |
Graph-GRPO: Training Graph Flow Models with Reinforcement Learning |
提出Graph-GRPO,通过强化学习训练图流模型以优化图生成任务 |
reinforcement learning flow matching |
|
|
| 6 |
Ergodicity in reinforcement learning |
探讨非遍历性奖励过程对强化学习的影响,并分析现有解决方案。 |
reinforcement learning |
|
|
| 7 |
Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models |
提出动态预测采样(DPS)方法,加速大模型推理能力强化学习微调。 |
reinforcement learning large language model |
|
|
| 8 |
Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis |
EvoKernel:面向NPU内核合成的价值驱动记忆方法,实现冷启动和持续优化 |
reinforcement learning large language model |
|
|
| 9 |
ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning |
提出ReTabSyn以解决低数据和不平衡表格数据合成问题 |
reinforcement learning |
|
|
| 10 |
Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning |
提出基于强化学习的集群调度器调优方法,提升作业性能和集群利用率。 |
reinforcement learning |
|
|
| 11 |
Adaptive Active Learning for Regression via Reinforcement Learning |
提出基于强化学习的自适应主动回归学习方法,提升标注效率。 |
reinforcement learning |
|
|
| 12 |
Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression |
提出STemDist,一种用于时空预测的双维度压缩数据集蒸馏方法。 |
distillation |
|
|
| 13 |
Riemannian MeanFlow for One-Step Generation on Manifolds |
提出黎曼MeanFlow,用于流形上的一步生成,提升质量-效率权衡。 |
flow matching classifier-free guidance |
|
|