| 13 |
CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations |
提出CHEHAB RL,利用深度强化学习优化全同态加密计算。 |
reinforcement learning deep reinforcement learning OMOMO |
|
|
| 14 |
Self-Distillation Enables Continual Learning |
提出自蒸馏微调SDFT,实现从演示中持续学习,缓解灾难性遗忘。 |
reinforcement learning policy learning distillation |
|
|
| 15 |
From Observations to Events: Event-Aware World Model for Reinforcement Learning |
提出事件感知世界模型EAWM,提升MBRL在结构相似场景中的泛化能力。 |
reinforcement learning policy learning world model |
✅ |
|
| 16 |
The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-Modal Divergence |
通过几何力学分析对比表示学习,揭示对齐势、熵扩散和跨模态散度的内在联系 |
representation learning contrastive learning multimodal |
|
|
| 17 |
On the Expressiveness of State Space Models via Temporal Logics |
通过时序逻辑分析状态空间模型(SSM)的表达能力 |
SSM state space model large language model |
|
|
| 18 |
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning |
提出基于Group DRO的强化学习框架,提升LLM在复杂推理任务中的性能 |
reinforcement learning large language model |
|
|
| 19 |
Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action |
IRA:通过即时回顾动作提升在线强化学习中的策略利用 |
reinforcement learning representation learning |
|
|
| 20 |
Privacy-Preserving Model Transcription with Differentially Private Synthetic Distillation |
提出差分隐私合成蒸馏,实现数据自由的隐私保护模型转录 |
distillation |
|
|
| 21 |
Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning |
提出自适应熵调度方法以应对非平稳强化学习中的环境漂移问题 |
reinforcement learning |
|
|
| 22 |
R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning |
R^3:通过回放、反思和排序奖励提升LLM在复杂推理任务中的强化学习性能 |
reinforcement learning |
|
|
| 23 |
Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making |
提出双重公平策略学习框架,解决决策中的行动公平与结果公平问题 |
policy learning |
|
|