| 1 |
Reinforcement Learning via Self-Distillation |
提出自蒸馏策略优化(SDPO),利用反馈信息提升强化学习效果 |
reinforcement learning distillation large language model |
|
|
| 2 |
PatchFormer: A Patch-Based Time Series Foundation Model with Hierarchical Masked Reconstruction and Cross-Domain Transfer Learning for Zero-Shot Multi-Horizon Forecasting |
PatchFormer:基于分层掩码重建和跨域迁移学习的时间序列基础模型,用于零样本多步预测。 |
distillation foundation model |
|
|
| 3 |
Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models |
提出PU-RL蒸馏方法,用于在本地小模型上实现强化学习对齐。 |
reinforcement learning direct preference optimization distillation |
|
|
| 4 |
Less is More: Clustered Cross-Covariance Control for Offline RL |
提出聚类交叉协方差控制(C^4)方法,解决离线强化学习中的分布偏移问题。 |
reinforcement learning policy learning offline RL |
|
|
| 5 |
Proactive SFC Provisioning with Forecast-Driven DRL in Data Centers |
提出一种基于预测驱动的DRL方法,用于数据中心中主动式的SFC资源分配。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 6 |
GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning |
提出GraphAllocBench:一个灵活的偏好条件多目标策略学习基准。 |
reinforcement learning policy learning |
|
|
| 7 |
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning |
提出失败前缀条件学习方法,解决LLM在饱和推理问题上的训练停滞问题 |
reinforcement learning large language model |
|
|
| 8 |
Ranking-aware Reinforcement Learning for Ordinal Ranking |
提出排序感知强化学习(RARL)框架,解决序数排序中的依赖关系建模难题。 |
reinforcement learning |
|
|
| 9 |
CCMamba: Selective State-Space Models for Higher-Order Graph Learning on Combinatorial Complexes |
提出CCMamba,用于组合复形上高阶图学习的选择性状态空间模型 |
Mamba |
|
|
| 10 |
C2:Cross learning module enhanced decision transformer with Constraint-aware loss for auto-bidding |
C2:结合约束感知损失的交叉学习决策Transformer,用于增强自动竞价效果 |
decision transformer |
✅ |
|
| 11 |
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning |
Spark:通过动态分支的策略感知探索,解决长时程Agent学习中的资源分配问题 |
reinforcement learning large language model |
|
|
| 12 |
Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery |
提出基于自我怀疑与恢复的元认知强化学习框架,提升奖励腐蚀环境下的鲁棒性。 |
reinforcement learning |
|
|
| 13 |
Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning |
提出基于谱分析的自监督学习统一框架,提升表征学习效率 |
representation learning |
|
|