cs.LG(2025-12-23)
📊 共 3 篇论文
🎯 兴趣领域导航
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Masking Teacher and Reinforcing Student for Distilling Vision-Language Models | 提出Masters框架,通过掩码教师模型和强化学生模型,实现视觉-语言模型的有效蒸馏。 | reinforcement learning offline RL distillation | ||
| 2 | Generalization of RLVR Using Causal Reasoning as a Testbed | 利用因果推理作为测试平台,研究RLVR在复杂推理任务中的泛化能力 | reinforcement learning large language model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 3 | Learning to Reason in LLMs by Expectation Maximization | 提出基于期望最大化的LLM推理学习框架,优化生成合理化解释。 | large language model |