| 11 |
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale |
RADLADS:通过快速注意力蒸馏实现线性注意力解码器的大规模应用 |
linear attention distillation |
✅ |
|
| 12 |
RM-R1: Reward Modeling as Reasoning |
提出RM-R1,通过推理增强奖励模型,提升大语言模型对齐人类偏好的能力。 |
reinforcement learning distillation large language model |
✅ |
|
| 13 |
A Survey on Progress in LLM Alignment from the Perspective of Reward Design |
综述性论文:从奖励设计的角度探讨LLM对齐的进展 |
reinforcement learning reward design large language model |
|
|
| 14 |
Generative Sign-description Prompts with Multi-positive Contrastive Learning for Sign Language Recognition |
提出GSP-MC方法,利用生成式提示和多正例对比学习提升手语识别精度。 |
contrastive learning large language model |
|
|
| 15 |
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning |
EMORL:集成多目标强化学习,高效灵活地微调大型语言模型 |
reinforcement learning large language model |
|
|
| 16 |
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards |
综述:基于奖励模型的LLM奖励学习策略研究 |
reinforcement learning RLHF DPO |
✅ |
|
| 17 |
JTCSE: Joint Tensor-Modulus Constraints and Cross-Attention for Unsupervised Contrastive Learning of Sentence Embeddings |
提出JTCSE框架,通过联合张量模约束和交叉注意力提升无监督对比学习句向量表示 |
contrastive learning distillation |
|
|
| 18 |
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning |
SIMPLEMIX:一种简单有效的混合策略,提升语言模型偏好学习效果 |
preference learning DPO |
|
|