| 15 |
The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning |
综述LLM/VLM在强化学习中的应用,解决知识缺乏、长程规划和奖励设计等挑战 |
reinforcement learning reward design large language model |
|
|
| 16 |
Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification |
Mantis:轻量级校准时间序列分类基础模型,提升用户友好性 |
contrastive learning foundation model |
|
|
| 17 |
Hyperspherical Normalization for Scalable Deep Reinforcement Learning |
SimbaV2通过超球面归一化和奖励缩放,提升深度强化学习在大模型上的可扩展性和稳定性。 |
reinforcement learning deep reinforcement learning |
✅ |
|
| 18 |
SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning |
SALSA-RL:基于动作潜在空间稳定性的强化学习方法,提升可解释性。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 19 |
Enhancing PPO with Trajectory-Aware Hybrid Policies |
提出HP3O算法,利用轨迹回放缓存增强PPO,提升强化学习性能 |
reinforcement learning PPO |
|
|
| 20 |
Towards a Reward-Free Reinforcement Learning Framework for Vehicle Control |
提出一种免奖励强化学习框架,用于解决车辆控制中人工奖励设计偏差问题。 |
reinforcement learning imitation learning |
|
|
| 21 |
SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning |
SpikeRL:一种可扩展且节能的深度脉冲强化学习框架,用于复杂连续控制任务。 |
reinforcement learning deep reinforcement learning |
|
|
| 22 |
Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF |
提出投影优化框架,高效解决多目标和多群体RLHF问题 |
reinforcement learning RLHF |
|
|
| 23 |
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors |
提出基于数据依赖高斯混合先验的表征学习泛化保证方法 |
representation learning |
|
|