| 17 |
Adversarial Reinforcement Learning for Large Language Model Agent Safety |
提出ARLAS,利用对抗强化学习提升大语言模型Agent的安全性,防御提示注入攻击。 |
reinforcement learning large language model |
|
|
| 18 |
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization |
提出Margin Adaptive DPO,利用奖励模型实现偏好优化中的细粒度控制 |
DPO direct preference optimization large language model |
|
|
| 19 |
Adjusting the Output of Decision Transformer with Action Gradient |
提出基于动作梯度的决策Transformer优化方法,提升离线强化学习性能 |
reinforcement learning offline RL decision transformer |
|
|
| 20 |
Boomerang Distillation Enables Zero-Shot Model Size Interpolation |
Boomerang蒸馏实现零样本模型尺寸插值,高效构建模型族 |
distillation large language model |
✅ |
|
| 21 |
MCCE: A Framework for Multi-LLM Collaborative Co-Evolution |
提出MCCE框架,利用多LLM协同进化解决多目标离散优化问题 |
reinforcement learning distillation large language model |
|
|
| 22 |
Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions |
提出基于归一化流的高效高斯潜在空间偏信息分解方法 |
predictive model multimodal |
|
|
| 23 |
Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding |
提出DVI框架以解决自回归解码的延迟瓶颈问题 |
distillation large language model |
|
|
| 24 |
Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration |
提出基于好奇心驱动的机器人动作与语言学习框架,实现自主探索和组合泛化。 |
reinforcement learning large language model |
|
|
| 25 |
Reinforce-Ada: An Adaptive Sampling Framework under Non-linear RL Objectives |
提出Reinforce-Ada自适应采样框架,解决非线性RL目标下大语言模型推理中的信号丢失问题。 |
reinforcement learning large language model |
✅ |
|
| 26 |
Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails |
揭示LLM Agent自进化过程中的对齐倾覆现象,及其对长期可靠性的威胁 |
reinforcement learning large language model |
✅ |
|
| 27 |
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning |
提出测试时课程强化学习(TTC-RL),解决模型在特定任务上的持续学习问题。 |
reinforcement learning |
|
|