| 8 |
A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning |
提出一种基于分层深度强化学习的多智能体动态投资组合优化系统,提升风险调整收益。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 9 |
Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference |
提出半参数双重强化学习,用于长期因果推断,提升策略价值估计效率。 |
reinforcement learning DRL |
|
|
| 10 |
DRDT3: Diffusion-Refined Decision Test-Time Training Model |
提出DRDT3模型,融合扩散模型与测试时训练,提升离线强化学习决策Transformer性能。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 11 |
Average Reward Reinforcement Learning for Wireless Radio Resource Management |
提出平均奖励Off-policy软演员评论家算法,解决无线资源管理中折扣奖励与长期目标不匹配问题 |
reinforcement learning SAC |
|
|
| 12 |
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training |
提出SPAM优化器,通过动量重置和梯度裁剪解决LLM训练中的梯度爆炸问题,提升训练稳定性和资源效率。 |
reinforcement learning large language model |
✅ |
|
| 13 |
Pareto Set Learning for Multi-Objective Reinforcement Learning |
提出PSL-MORL,利用超网络学习Pareto集,高效解决多目标强化学习问题 |
reinforcement learning |
|
|