| 1 |
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning |
提出自适应轨迹约束探索策略以解决深度强化学习中的探索问题 |
reinforcement learning deep reinforcement learning DRL |
✅ |
|
| 2 |
Model Selection for Inverse Reinforcement Learning via Structural Risk Minimization |
提出基于结构风险最小化的逆强化学习模型选择方法 |
reinforcement learning inverse reinforcement learning |
|
|
| 3 |
Preference as Reward, Maximum Preference Optimization with Importance Sampling |
提出基于重要性采样的最大偏好优化算法(MPO),提升语言模型与人类价值观对齐效果。 |
reinforcement learning PPO preference learning |
|
|
| 4 |
Soft Contrastive Learning for Time Series |
提出SoftCLT,通过软对比学习提升时间序列表征质量。 |
contrastive learning TAMP |
✅ |
|
| 5 |
MIM4DD: Mutual Information Maximization for Dataset Distillation |
MIM4DD:通过互信息最大化实现数据集蒸馏,提升信息保留度 |
contrastive learning distillation |
|
|
| 6 |
Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning |
提出动态子图蒸馏(DSGD)方法,解决半监督持续学习中的灾难性遗忘问题。 |
distillation |
|
|
| 7 |
Foundations of Reinforcement Learning and Interactive Decision Making |
构建强化学习与交互决策的统计基础理论框架,关注函数逼近和高维反馈问题 |
reinforcement learning |
|
|
| 8 |
Active Third-Person Imitation Learning |
提出主动第三人称模仿学习框架,解决视角选择问题 |
imitation learning |
|
|
| 9 |
Learning to Embed Time Series Patches Independently |
提出独立时间序列块嵌入方法,提升时间序列预测与分类性能。 |
representation learning contrastive learning |
✅ |
|