| 9 |
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization |
DPO隐式奖励模型泛化性受限,不如显式奖励模型稳定 |
reinforcement learning RLHF DPO |
|
|
| 10 |
Discovering Cyclists' Visual Preferences Through Shared Bike Trajectories and Street View Images Using Inverse Reinforcement Learning |
提出基于逆强化学习的框架,通过共享单车轨迹和街景图像发现骑行者视觉偏好 |
reinforcement learning inverse reinforcement learning |
|
|
| 11 |
Asynchronous Stochastic Approximation with Applications to Average-Reward Reinforcement Learning |
扩展异步随机逼近算法,为平均奖励强化学习提供更广泛的收敛保证 |
reinforcement learning |
|
|
| 12 |
Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron |
提出非线性感知器学习动态分析框架,研究监督学习与强化学习差异 |
reinforcement learning |
|
|
| 13 |
CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning |
提出CHIRP指标,预测环境变化对终身强化学习智能体性能的影响 |
reinforcement learning |
|
|
| 14 |
ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models |
提出基于ELO评分的序列奖励方法ERRL,解决长时程强化学习中的奖励函数设计难题。 |
reinforcement learning |
|
|
| 15 |
Causal Temporal Representation Learning with Nonstationary Sparse Transition |
提出CtrlNS框架,解决非平稳时间序列中因果关系学习对先验知识的依赖问题。 |
representation learning |
|
|
| 16 |
Sparsifying Parametric Models with L0 Regularization |
利用L0正则化稀疏化参数模型,应用于深度强化学习控制偏微分方程 |
reinforcement learning deep reinforcement learning |
✅ |
|