| 23 |
LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning |
提出LLM-ODDR框架,利用大语言模型解决网约车订单分配与司机调度联合优化问题 |
reinforcement learning spatiotemporal large language model |
|
|
| 24 |
A Closer Look at Multimodal Representation Collapse |
揭示多模态表征坍塌机理,提出显式基向量重分配算法以提升多模态融合性能。 |
distillation multimodal |
✅ |
|
| 25 |
Scaling Offline RL via Efficient and Expressive Shortcut Models |
提出SORL算法,利用高效且富有表现力的捷径模型扩展离线强化学习。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 26 |
SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning |
提出SOReL和TOReL,解决离线强化学习中超参数调优和性能评估难题。 |
reinforcement learning offline RL offline reinforcement learning |
✅ |
|
| 27 |
Estimating the Effects of Sample Training Orders for Large Language Models without Retraining |
提出一种免重训练框架,用于评估大语言模型训练样本顺序的影响 |
curriculum learning large language model |
|
|
| 28 |
Preference Learning with Response Time: Robust Losses and Guarantees |
提出基于响应时间的偏好学习方法,提升奖励模型学习的样本效率与理论保证。 |
preference learning foundation model |
|
|
| 29 |
Skywork Open Reasoner 1 Technical Report |
Skywork-OR1:通过强化学习提升长CoT模型推理能力,显著超越同规模模型。 |
reinforcement learning large language model chain-of-thought |
|
|
| 30 |
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding |
提出DRG-Sapphire,利用强化学习解决LLM在DRG编码中的分布外推理难题。 |
reinforcement learning large language model |
|
|
| 31 |
SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training |
提出SDPO以解决扩散模型训练中的偏差与不稳定问题 |
preference learning DPO direct preference optimization |
|
|
| 32 |
Scaling Reasoning without Attention |
提出无注意力语言模型以解决推理效率低下问题 |
Mamba state space model large language model |
|
|
| 33 |
Two-Stage Feature Generation with Transformer and Reinforcement Learning |
提出基于Transformer和强化学习的两阶段特征生成框架,提升模型性能和适应性。 |
reinforcement learning PPO |
|
|
| 34 |
A Provable Approach for End-to-End Safe Reinforcement Learning |
提出PLS:一种可证明的端到端安全强化学习方法,确保学习和部署全过程的安全性。 |
reinforcement learning |
|
|
| 35 |
Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking |
提出Contraction Actor-Critic算法,用于未知动力学下的鲁棒路径跟踪。 |
reinforcement learning |
|
|
| 36 |
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models |
针对推理语言模型,提出基于熵机制的强化学习方法,解决策略熵坍塌问题。 |
reinforcement learning |
|
|
| 37 |
Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation |
提出物理信息蒸馏方法以解决扩散模型中的PDE约束问题 |
distillation |
|
|
| 38 |
When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks? |
探讨神经进化在迁移学习任务中超越强化学习的能力 |
reinforcement learning |
|
|
| 39 |
An Augmentation-Aware Theory for Self-Supervised Contrastive Learning |
提出一种数据增强感知的自监督对比学习理论框架,显式建模数据增强的影响。 |
contrastive learning |
|
|
| 40 |
Weakly-Supervised Contrastive Learning for Imprecise Class Labels |
提出基于图的弱监督对比学习框架,解决标签不准确情况下的表征学习问题 |
contrastive learning |
✅ |
|
| 41 |
FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators |
FNOPE:利用傅里叶神经算子进行函数空间上的模拟推断,提升时空过程建模效率。 |
flow matching spatiotemporal |
|
|
| 42 |
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training |
改进群体相对策略优化:探索其在On-Policy和Off-Policy训练中的应用 |
reinforcement learning PPO |
|
|