| 16 |
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models |
提出“机器胡扯”框架与指标,揭示大语言模型中涌现的对真理的漠视现象 |
reinforcement learning RLHF large language model |
|
|
| 17 |
Towards Interpretable Time Series Foundation Models |
提出一种基于指令调优的小型语言模型,用于时间序列的可解释性分析。 |
distillation foundation model multimodal |
|
|
| 18 |
Distilling Empathy from Large Language Models |
提出一种基于提示工程的两阶段微调方法,用于将大型语言模型的共情能力蒸馏到小型语言模型中。 |
distillation large language model |
|
|
| 19 |
Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks |
通过知识蒸馏压缩LLM:在QA任务中探索模型压缩的极限。 |
distillation large language model |
|
|
| 20 |
The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs |
揭示长CoT SFT与RL在视觉语言模型推理中的协同困境,探究后训练技术瓶颈 |
reinforcement learning multimodal chain-of-thought |
|
|
| 21 |
RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning |
RLEP:通过经验回放增强LLM推理的强化学习方法 |
reinforcement learning large language model |
✅ |
|
| 22 |
SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment |
提出SAGE,通过事实增强和熵感知对齐解决VLM在工业异常检测中的难题。 |
DPO direct preference optimization multimodal |
✅ |
|
| 23 |
Not All Preferences are What You Need for Post-Training: Selective Alignment Strategy for Preference Optimization |
提出选择性对齐策略Selective-DPO,提升LLM偏好优化效率与准确性 |
DPO distillation large language model |
✅ |
|
| 24 |
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code |
TeaR:通过算法问题强化学习,提升LLM的推理能力,无需编写代码。 |
reinforcement learning distillation |
|
|
| 25 |
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving |
提出PLAN-TUNING,通过模仿规划过程提升小模型在复杂问题求解上的能力。 |
reinforcement learning large language model |
|
|