| 1 |
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models |
Mamba-Shedder:用于高效选择性结构化状态空间模型的Transformer后压缩 |
Mamba SSM state space model |
✅ |
|
| 2 |
Decoding Human Preferences in Alignment: An Improved Approach to Inverse Constitutional AI |
改进逆向宪法AI方法,提升从偏好数据集中提取原则的准确性和泛化性 |
reinforcement learning RLHF DPO |
|
|
| 3 |
On the Interplay Between Sparsity and Training in Deep Reinforcement Learning |
研究稀疏架构在深度强化学习中的作用,提升图像领域任务性能 |
reinforcement learning deep reinforcement learning |
|
|
| 4 |
Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies |
分析强化学习在DeepSeek-R1模型安全对齐中的局限性,提出混合训练方案 |
reinforcement learning large language model |
|
|
| 5 |
Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learning |
提出HAPFL,通过自适应双智能体强化学习实现异构环境下个性化联邦学习。 |
reinforcement learning PPO distillation |
|
|
| 6 |
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models |
提出TAID以解决语言模型蒸馏中的容量差异问题 |
distillation foundation model |
|
|
| 7 |
On Rollouts in Model-Based Reinforcement Learning |
提出Infoprop,分离模型不确定性,提升基于模型的强化学习rollout质量。 |
reinforcement learning policy learning |
|
|
| 8 |
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning |
研究最大熵强化学习在混沌动力系统中的泛化性和鲁棒性 |
reinforcement learning |
|
|
| 9 |
Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving |
评估LLM在高级数学问题求解中的token再生能力与领域偏差 |
Mamba large language model |
|
|
| 10 |
Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functional Interpretability in Reinforcement Learning |
提出一种基于功能模块的强化学习可解释性分析流程 |
reinforcement learning |
|
|
| 11 |
Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans |
Flow Matching:通过马尔可夫核、随机过程和传输计划学习生成模型的速度场 |
flow matching |
|
|
| 12 |
Safe Reinforcement Learning for Real-World Engine Control |
提出基于安全监控的强化学习工具链,用于真实发动机控制 |
reinforcement learning |
|
|