| 1 |
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift |
提出NS-DPO,解决LLM在偏好漂移下的非平稳直接偏好优化问题 |
DPO direct preference optimization large language model |
|
|
| 2 |
Multi-Agent Deep Reinforcement Learning for Energy Efficient Multi-Hop STAR-RIS-Assisted Transmissions |
提出多跳STAR-RIS架构以提升无线通信能效 |
reinforcement learning deep reinforcement learning |
|
|
| 3 |
DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning |
提出DTFormer,一种基于Transformer的离散时间动态图表示学习方法 |
representation learning TAMP |
|
|
| 4 |
Contrastive Learning of Asset Embeddings from Financial Time Series |
提出一种基于对比学习的金融时间序列资产嵌入方法,用于行业分类和投资组合优化。 |
representation learning contrastive learning |
|
|
| 5 |
SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments |
SOAP-RL:在POMDP环境中利用序列化选项优势传播进行强化学习 |
reinforcement learning |
✅ |
|
| 6 |
Reinforcement learning for anisotropic p-adaptation and error estimation in high-order solvers |
提出基于强化学习的各向异性p-自适应方法,用于优化高阶求解器中的误差估计。 |
reinforcement learning |
|
|
| 7 |
A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach |
提出MLMC-NAC算法,实现平均奖励强化学习$\tilde{\mathcal{O}}(1/\sqrt{T})$全局收敛,无需混合和命中时间知识。 |
reinforcement learning |
|
|
| 8 |
Downlink Channel Covariance Matrix Estimation via Representation Learning with Graph Regularization |
提出基于图正则化的表示学习算法以估计下行信道协方差矩阵 |
representation learning |
|
|
| 9 |
The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning |
提出跨环境超参数设置基准,用于评估强化学习算法对超参数的敏感性。 |
reinforcement learning |
|
|
| 10 |
Reinforcement Learning for Sustainable Energy: A Survey |
综述性论文:强化学习在可持续能源领域的应用与挑战 |
reinforcement learning |
|
|