| 15 |
Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning |
提出凸性指导的深度强化学习方法,解决信念MDP中的值函数学习问题 |
reinforcement learning deep reinforcement learning DRL |
✅ |
|
| 16 |
When Do Neural Networks Learn World Models? |
理论分析神经网络在多任务学习中学习世界模型的能力 |
world model large language model |
|
|
| 17 |
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents |
Digi-Q:学习Q值函数训练设备控制Agent,提升离线策略学习效果 |
reinforcement learning policy learning foundation model |
✅ |
|
| 18 |
Reevaluating Policy Gradient Methods for Imperfect-Information Games |
重新评估策略梯度方法在不完美信息博弈中的有效性 |
reinforcement learning deep reinforcement learning DRL |
✅ |
|
| 19 |
A Survey of Reinforcement Learning for Optimization in Automation |
综述:强化学习在自动化优化中的应用,聚焦制造、能源和机器人领域 |
reinforcement learning |
|
|
| 20 |
Variational Rectified Flow Matching |
提出变分校正流匹配,通过建模多模态速度向量场提升生成模型性能。 |
flow matching |
|
|
| 21 |
Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning |
提出α-近似投资组合以解决多目标强化学习中的社会福利函数选择问题 |
reinforcement learning |
|
|
| 22 |
Neuro-Symbolic Contrastive Learning for Cross-domain Inference |
提出神经符号对比学习框架,提升跨领域推理中逻辑关系的泛化能力。 |
contrastive learning |
|
|
| 23 |
Beyond Shallow Behavior: Task-Efficient Value-Based Multi-Task Offline MARL via Skill Discovery |
提出SD-CQL算法,解决离线多智能体强化学习中的任务泛化与效率问题 |
CQL conservative q-learning behavior cloning |
|
|
| 24 |
SinSim: Sinkhorn-Regularized SimCLR |
SinSim:通过Sinkhorn正则化的SimCLR,提升自监督学习表征结构 |
representation learning contrastive learning |
|
|
| 25 |
Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation |
分析线性函数逼近下Off-Policy n步TD学习的收敛性 |
reinforcement learning policy learning |
|
|