| 8 |
Deep Reinforcement Learning for Day-to-day Dynamic Tolling in Tradable Credit Schemes |
提出基于深度强化学习的交易信用计划动态收费方法,优化交通拥堵。 |
reinforcement learning deep reinforcement learning |
|
|
| 9 |
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning |
VL-Rethinker:利用强化学习激励视觉语言模型进行自我反思,提升复杂推理能力 |
reinforcement learning distillation multimodal |
|
|
| 10 |
Rethinking the Foundations for Continual Reinforcement Learning |
重新审视持续强化学习的基础理论,提出基于历史过程的新形式化框架 |
reinforcement learning |
|
|
| 11 |
A Relative Ignorability Framework for Decision-Relevant Observability in Control Theory and Reinforcement Learning |
提出相对可忽略性框架,解决控制理论和强化学习中决策相关可观测性问题 |
reinforcement learning |
|
|
| 12 |
ms-Mamba: Multi-scale Mamba for Time-Series Forecasting |
提出ms-Mamba,一种用于时间序列预测的多尺度Mamba架构。 |
Mamba |
|
|
| 13 |
Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation |
提出HeteroAKD,用于异构架构语义分割知识蒸馏,提升学生模型性能。 |
teacher-student distillation |
|
|
| 14 |
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining |
RL后训练放大预训练行为,揭示数学推理模型训练偏差与泛化特性 |
reinforcement learning PPO |
|
|