| 1 |
Enhancing Q-Learning with Large Language Model Heuristics |
提出LLM引导的Q-learning,提升强化学习采样效率并避免偏差。 |
reinforcement learning reward shaping large language model |
|
|
| 2 |
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning |
提出反向-正向课程学习(RFCL),提升强化学习在稀疏奖励任务中的样本和演示效率。 |
reinforcement learning curriculum learning |
|
|
| 3 |
A Generalization Theory of Cross-Modality Distillation with Contrastive Learning |
提出跨模态对比蒸馏框架CMCD,并从理论上分析模态距离对泛化性能的影响。 |
contrastive learning distillation |
|
|
| 4 |
Federated Reinforcement Learning with Constraint Heterogeneity |
提出FedNPG和FedPPO,解决约束异构下的联邦强化学习问题 |
reinforcement learning PPO large language model |
|
|
| 5 |
Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows |
提出MOOD-CRL算法,通过因果归一化流解决离线强化学习中的分布外适应问题。 |
reinforcement learning policy learning offline RL |
|
|
| 6 |
Position: Leverage Foundational Models for Black-Box Optimization |
利用序列模型赋能黑盒优化:探索LLM在实验设计中的应用 |
reinforcement learning large language model foundation model |
|
|
| 7 |
Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning |
提出Chunked-TD算法,加速强化学习中的信用分配 |
reinforcement learning world model |
|
|
| 8 |
End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability |
提出基于深度强化学习的配电网有功无功协调控制方法,解决部分可观测性下的电压越限问题。 |
reinforcement learning deep reinforcement learning |
|
|
| 9 |
Functional Latent Dynamics for Irregularly Sampled Time Series Forecasting |
提出函数潜在动力学(FLD)模型,高效解决不规则采样时间序列预测问题。 |
latent dynamics |
|
|
| 10 |
ReinWiFi: Application-Layer QoS Optimization of WiFi Networks with Reinforcement Learning |
提出基于强化学习的ReinWiFi框架,优化异构应用下WiFi网络的QoS |
reinforcement learning |
|
|
| 11 |
Improved Forward-Forward Contrastive Learning |
提出改进的Forward-Forward对比学习算法,无需反向传播,更具生物合理性 |
contrastive learning |
|
|
| 12 |
Policy Learning for Balancing Short-Term and Long-Term Rewards |
提出一种平衡短期和长期回报的策略学习框架,解决长期影响评估问题。 |
policy learning |
|
|