| 18 |
BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning |
提出BiTrajDiff以解决离线强化学习中的数据分布偏差问题 |
reinforcement learning policy learning offline RL |
|
|
| 19 |
How to craft a deep reinforcement learning policy for wind farm flow control |
提出深度强化学习策略以优化风电场流动控制 |
reinforcement learning deep reinforcement learning |
|
|
| 20 |
Debiasing Online Preference Learning via Preference Feature Preservation |
提出偏好特征保留框架以解决在线偏好学习中的偏见问题 |
preference learning large language model |
|
|
| 21 |
Delphos: A reinforcement learning framework for assisting discrete choice model specification |
提出Delphos框架以优化离散选择模型的规范过程 |
reinforcement learning deep reinforcement learning |
|
|
| 22 |
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library |
提出ROLL库以解决大规模强化学习优化问题 |
reinforcement learning reward design |
|
|
| 23 |
FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts |
提出FlowOE以解决动态金融市场中的最优执行问题 |
imitation learning flow matching |
|
|
| 24 |
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance |
提出高效在线RFT方法以解决RLHF中的奖励模型训练瓶颈 |
reinforcement learning PPO RLHF |
|
|
| 25 |
Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning |
提出EEDQN以解决深度强化学习中的过估计偏差问题 |
reinforcement learning deep reinforcement learning |
|
|
| 26 |
Distillation Robustifies Unlearning |
提出UNDO方法以增强大规模模型的去学习鲁棒性 |
distillation |
|
|
| 27 |
Model-Driven Graph Contrastive Learning |
提出MGCL以解决图对比学习中的数据增强问题 |
contrastive learning |
|
|
| 28 |
Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models |
提出Table-r1以解决小语言模型的表格推理问题 |
reinforcement learning |
|
|
| 29 |
Exponential Family Variational Flow Matching for Tabular Data Generation |
提出Exponential Family Variational Flow Matching以解决表格数据生成问题 |
flow matching |
|
|
| 30 |
Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning |
提出超曲面几何方法以解决模型兼容性问题 |
representation learning |
|
|