| 1 |
Multimodal Functional Maximum Correlation for Emotion Recognition |
提出多模态功能最大相关(MFMC)框架,用于提升情感识别的性能。 |
representation learning multimodal |
✅ |
|
| 2 |
A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms |
提出LLM混合在线强化学习与模仿学习统一框架,提升微调效率 |
reinforcement learning imitation learning large language model |
|
|
| 3 |
Trust Region Masking for Long-Horizon LLM Reinforcement Learning |
提出Trust Region Masking,解决长序列LLM强化学习中的信任域失效问题。 |
reinforcement learning PPO large language model |
|
|
| 4 |
Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning |
提出基于动态词汇剪枝的稳定LLM强化学习方法,解决训练-推理不匹配问题。 |
reinforcement learning large language model |
|
|
| 5 |
ReDiF: Reinforced Distillation for Few Step Diffusion |
提出基于强化学习的扩散模型蒸馏框架ReDiF,实现更少步骤的高效生成。 |
reinforcement learning distillation |
|
|
| 6 |
Breaking the Memory Wall: Exact Analytical Differentiation via Tiled Operator-Space Evolution |
提出PGF框架,通过分块算子空间演化实现选择性状态空间模型中精确解析微分的内存优化。 |
SSM state space model PULSE |
|
|
| 7 |
Value-guided action planning with JEPA world models |
提出基于JEPA世界模型的价值引导动作规划方法,提升控制任务性能 |
world model |
|
|
| 8 |
APO: Alpha-Divergence Preference Optimization |
提出Alpha-Divergence偏好优化(APO),在锚定框架下实现前向和反向KL散度的平滑插值,提升对齐训练的稳定性和性能。 |
reinforcement learning PPO distillation |
|
|
| 9 |
Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning |
Sat-EnQ:通过满足性弱Q学习器集成实现可靠且计算高效的强化学习 |
reinforcement learning |
|
|
| 10 |
Causal-Policy Forest for End-to-End Policy Learning |
提出Causal-Policy Forest算法,用于端到端因果策略学习。 |
policy learning |
|
|
| 11 |
Long-Range Distillation: Distilling 10,000 Years of Simulated Climate into Long Timestep AI Weather Models |
提出长程蒸馏方法,利用AI生成气候数据提升长时程AI天气模型预测能力 |
distillation |
|
|
| 12 |
FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents |
提出FoldAct以解决长时间搜索代理的上下文折叠问题 |
reinforcement learning large language model |
✅ |
|