| 1 |
Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning |
提出STREAM-RL框架,通过不确定性感知的共形预测和世界模型强化学习实现安全城市交通控制。 |
reinforcement learning policy learning PPO |
|
|
| 2 |
Training A Foundation Model to Represent Graphs as Vectors |
提出一种图向量表征的图基础模型训练方法,用于图分类和图聚类等图级别任务。 |
contrastive learning foundation model |
|
|
| 3 |
Rethinking the Trust Region in LLM Reinforcement Learning |
提出DPPO算法,通过直接估计策略散度,提升LLM强化学习的稳定性和效率。 |
reinforcement learning PPO large language model |
|
|
| 4 |
Thickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM Reasoning |
提出T2T动态奖励框架,通过模拟人类学习动态提升LLM推理能力 |
reinforcement learning reward shaping large language model |
|
|
| 5 |
EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL |
EMA-PG:通过EMA锚定和Top-k KL提升LLM强化学习的稳定性和性能 |
reinforcement learning large language model |
✅ |
|
| 6 |
REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency |
REDistill:一种鲁棒的估计器蒸馏方法,平衡鲁棒性和效率 |
teacher-student distillation |
|
|
| 7 |
Beyond Rewards in Reinforcement Learning for Cyber Defence |
提出稀疏奖励机制以优化网络防御中的强化学习 |
reinforcement learning deep reinforcement learning |
|
|
| 8 |
SAFE: Stable Alignment Finetuning with Entropy-Aware Predictive Control for RLHF |
SAFE:通过熵感知预测控制实现RLHF的稳定对齐微调 |
PPO RLHF |
✅ |
|
| 9 |
Stochastic Decision Horizons for Constrained Reinforcement Learning |
提出基于随机决策范围的约束强化学习方法,提升样本效率和回报-违例权衡。 |
reinforcement learning SAC |
|
|
| 10 |
Topology-Aware Revival for Efficient Sparse Training |
提出拓扑感知复苏(TAR)方法,提升静态稀疏训练在强化学习中的性能。 |
reinforcement learning deep reinforcement learning SAC |
|
|
| 11 |
Contrastive Continual Learning for Model Adaptability in Internet of Things |
提出对比持续学习以解决物联网模型适应性问题 |
representation learning contrastive learning distillation |
|
|
| 12 |
CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation |
提出CRoSS:一个可扩展的、高任务多样性和真实物理仿真的持续机器人学习平台。 |
reinforcement learning |
|
|
| 13 |
The Key to State Reduction in Linear Attention: A Rank-based Perspective |
提出基于秩的线性注意力状态压缩方法,提升效率并降低内存占用。 |
linear attention |
✅ |
|
| 14 |
Rationality Measurement and Theory for Reinforcement Learning Agents |
提出理性测量与理论以优化强化学习代理的决策 |
reinforcement learning |
✅ |
|
| 15 |
DMFlow: Disordered Materials Generation by Flow Matching |
提出DMFlow,通过流匹配生成无序材料,填补了深度生成模型在无序晶体生成方面的空白。 |
flow matching |
|
|
| 16 |
Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels |
提出基于知识蒸馏的毫米波波束预测方法,利用Sub-6 GHz信道信息,降低计算复杂度。 |
distillation |
|
|
| 17 |
MirrorLA: Reflecting Feature Map for Vision Linear Attention |
MirrorLA通过反射特征图解决线性注意力性能下降问题,提升表征能力。 |
linear attention |
|
|
| 18 |
Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting |
提出支持广义折扣的风险敏感强化学习框架,解耦时间偏好与风险评估 |
reinforcement learning |
|
|
| 19 |
Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning |
提出基于进化仿生模型的传入学习框架,用于损伤规避学习。 |
reinforcement learning policy learning |
|
|
| 20 |
From Ambiguity to Action: A POMDP Perspective on Partial Multi-Label Ambiguity and Its Horizon-One Resolution |
提出基于POMDP的部分多标签学习框架,解决标签歧义并优化特征选择。 |
reinforcement learning transformer policy |
|
|