| 1 |
AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees |
AlphaInventory:利用大语言模型演化具有部署保证的白盒库存策略 |
reinforcement learning large language model |
|
|
| 2 |
A Policy-Driven DRL Framework for System-Level Tradeoff Control in NR-U/Wi-Fi Coexistence |
提出策略驱动的DRL框架,用于NR-U/Wi-Fi共存系统级权衡控制 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning |
ResRL:通过负样本投影残差强化学习提升LLM推理能力 |
reinforcement learning large language model |
✅ |
|
| 4 |
Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning |
提出增强拉格朗日乘子网络(ALaM),解决强化学习中状态安全约束下的训练不稳定问题。 |
reinforcement learning SAC |
|
|
| 5 |
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning |
Odysseus:通过强化学习将视觉语言模型扩展到游戏中100+步决策 |
reinforcement learning PPO |
|
|
| 6 |
Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation |
提出迷你批量风险度量以解决风险厌恶的马尔可夫决策问题 |
reinforcement learning |
|
|
| 7 |
Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation |
提出一种双重Oracle高效的强化学习算法以解决大规模环境中的计算瓶颈问题 |
reinforcement learning |
|
|
| 8 |
Binomial flows: Denoising and flow matching for discrete ordinal data |
提出二项流模型,解决离散序数数据的去噪和流匹配问题 |
flow matching |
|
|
| 9 |
Free Energy Surface Sampling via Reduced Flow Matching |
提出FES-FM方法,通过约简流匹配实现高效自由能面采样 |
flow matching |
|
|
| 10 |
SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control |
SAVGO:基于余弦相似度的状态-动作价值几何学习,用于连续控制 |
reinforcement learning representation learning |
|
|