| 1 |
DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning |
DiffPoGAN:结合扩散模型与GAN的离线强化学习方法,解决外推误差问题。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 2 |
Is Value Learning Really the Main Bottleneck in Offline RL? |
离线强化学习瓶颈研究:策略提取与泛化能力是关键,而非单纯价值学习 |
reinforcement learning policy learning offline RL |
|
|
| 3 |
CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms |
提出CUER算法,通过修正的均匀经验回放提升离策略连续控制深度强化学习性能 |
reinforcement learning deep reinforcement learning |
|
|
| 4 |
A Dual Approach to Imitation Learning from Observations with Offline Datasets |
DILO:基于离线数据集和观测的对偶模仿学习方法 |
policy learning offline RL imitation learning |
✅ |
|
| 5 |
Cognitively Inspired Energy-Based World Models |
提出能量基世界模型(EBWM),模拟人类认知,提升世界模型的推理和规划能力。 |
world model large language model |
|
|
| 6 |
Online Bandit Learning with Offline Preference Data for Improved RLHF |
提出warmPref-PS算法以利用离线偏好数据改进RLHF |
reinforcement learning RLHF |
|
|
| 7 |
Q-S5: Towards Quantized State Space Models |
Q-S5:面向边缘部署的量化状态空间模型研究 |
SSM state space model |
|
|
| 8 |
CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving |
CIMRL:结合模仿学习与强化学习的安全自动驾驶方法 |
reinforcement learning imitation learning |
|
|
| 9 |
You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning |
大规模自监督学习中,仅使用裁剪的数据增强即可达到SOTA性能 |
MAE foundation model |
|
|
| 10 |
Federated Contrastive Learning for Personalized Semantic Communication |
提出联邦对比学习框架,用于个性化语义通信,解决异构数据下的语义不平衡问题。 |
contrastive learning |
|
|
| 11 |
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning |
提出XLand-100B大规模数据集,用于提升上下文强化学习泛化能力。 |
reinforcement learning |
|
|
| 12 |
Current applications and potential future directions of reinforcement learning-based Digital Twins in agriculture |
综述:强化学习驱动的农业数字孪生应用与未来方向 |
reinforcement learning |
|
|
| 13 |
Introducing Diminutive Causal Structure into Graph Representation Learning |
提出基于微型因果结构的图表示学习方法,提升GNN在复杂图数据中的性能 |
representation learning |
|
|
| 14 |
Hadamard Representations: Augmenting Hyperbolic Tangents in RL |
提出Hadamard表示增强RL中双曲正切激活,缓解死亡神经元问题 |
reinforcement learning PPO |
|
|
| 15 |
T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation |
提出T-JEPA,通过联合嵌入预测架构提升轨迹相似度计算 |
representation learning contrastive learning |
|
|