| 9 |
The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models |
提出奖励-策略映射理论框架,分析大语言模型策略脆性和不稳定性问题 |
reinforcement learning RLHF large language model |
|
|
| 10 |
NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis |
NeuroCLIP:一种用于rTMS治疗的甲基苯丙胺成瘾分析的多模态对比学习方法 |
contrastive learning multimodal |
|
|
| 11 |
Learning to Align Human Code Preferences |
提出自适应偏好优化APO,动态对齐LLM的代码偏好,提升代码生成质量。 |
DPO direct preference optimization large language model |
|
|
| 12 |
Multi-Agent Reinforcement Learning for Dynamic Mobility Resource Allocation with Hierarchical Adaptive Grouping |
提出基于分层自适应分组的多智能体强化学习方法,用于动态交通资源分配。 |
reinforcement learning |
|
|
| 13 |
StepFun-Prover Preview: Let's Think and Verify Step by Step |
StepFun-Prover Preview:提出工具集成推理的大语言模型,用于形式化定理证明。 |
reinforcement learning large language model |
|
|
| 14 |
Concept Learning for Cooperative Multi-Agent Reinforcement Learning |
提出基于概念瓶颈的多智能体强化学习方法CMQ,提升合作策略的可解释性与性能。 |
reinforcement learning |
|
|