| 1 |
You Can Learn Tokenization End-to-End with Reinforcement Learning |
提出基于强化学习的端到端分词方法,提升大语言模型性能 |
reinforcement learning large language model |
|
|
| 2 |
Zero-Shot Instruction Following in RL via Structured LTL Representations |
提出基于结构化LTL表示的零样本强化学习指令跟随方法 |
reinforcement learning instruction following |
|
|
| 3 |
EIDOS: Latent-Space Predictive Learning for Time Series Foundation Models |
EIDOS:面向时间序列基础模型的潜空间预测学习框架 |
latent dynamics foundation model |
|
|
| 4 |
Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling |
提出深度密集探索以解决大语言模型强化学习中的探索问题 |
reinforcement learning policy learning large language model |
|
|
| 5 |
Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning |
AERO:自适应高效Rollout优化,提升基于群组强化学习的LLM微调效率 |
reinforcement learning large language model |
|
|
| 6 |
DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices |
DeepFusion:通过联邦知识蒸馏加速异构边缘设备上的MoE模型训练 |
distillation large language model |
|
|
| 7 |
KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning |
KernelBlaster:通过内存增强的上下文强化学习实现CUDA跨任务持续优化 |
reinforcement learning large language model |
|
|
| 8 |
QuRL: Efficient Reinforcement Learning with Quantized Rollout |
QuRL:通过量化Rollout加速可验证奖励强化学习训练 |
reinforcement learning large language model |
|
|
| 9 |
Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study |
提出基于Conformal STL Shield的鲁棒强化学习控制方法,提升飞行控制可靠性 |
reinforcement learning PPO |
|
|
| 10 |
Radial-VCReg: More Informative Representation Learning Through Radial Gaussianization |
提出Radial-VCReg,通过径向高斯化学习更具信息量的自监督表征 |
representation learning |
|
|
| 11 |
Experiential Reinforcement Learning |
提出经验强化学习(ERL),通过显式经验反思循环提升语言模型在稀疏奖励环境下的学习效率。 |
reinforcement learning |
|
|