| 13 |
Towards a Unified View of Large Language Model Post-Training |
统一大语言模型后训练视角,提出混合后训练算法HPT,提升数学推理能力。 |
reinforcement learning large language model |
|
|
| 14 |
RL's Razor: Why Online Reinforcement Learning Forgets Less |
揭示RL微调优于SFT的原因:在线强化学习具备更少的遗忘性 |
reinforcement learning large language model foundation model |
|
|
| 15 |
Rethinking the long-range dependency in Mamba/SSM and transformer models |
从理论角度分析Mamba/SSM和Transformer模型中的长程依赖建模能力 |
Mamba SSM |
|
|
| 16 |
Wavelet Fourier Diffuser: Frequency-Aware Diffusion Model for Reinforcement Learning |
提出Wavelet Fourier Diffuser,解决离线强化学习中轨迹频率偏移问题。 |
reinforcement learning offline reinforcement learning |
|
|
| 17 |
Data-Augmented Quantization-Aware Knowledge Distillation |
提出数据增强感知的量化知识蒸馏方法,提升低比特模型精度 |
distillation |
|
|
| 18 |
Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology |
揭示强化学习、测试时缩放与扩散引导的内在联系,提出重采样对齐方法。 |
reinforcement learning |
|
|
| 19 |
Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning |
提出基于图强化学习的资源感知神经网络剪枝方法,提升剪枝效率和性能。 |
reinforcement learning |
|
|
| 20 |
Parking Availability Prediction via Fusing Multi-Source Data with A Self-Supervised Learning Enhanced Spatio-Temporal Inverted Transformer |
提出SST-iTransformer,融合多源数据和自监督学习,提升停车位可用性预测精度。 |
representation learning MAE |
|
|