| 1 |
Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach |
提出MedSSR框架,利用知识增强数据合成和半监督强化学习提升医疗推理能力 |
reinforcement learning distillation large language model |
✅ |
|
| 2 |
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping |
提出MEDS框架,通过记忆增强动态奖励塑造提升LLM采样多样性,减少重复错误。 |
reinforcement learning reward design reward shaping |
|
|
| 3 |
Probabilistic Prediction of Neural Dynamics via Autoregressive Flow Matching |
提出基于自回归Flow Matching的神经动力学概率预测框架,提升脑活动预测精度。 |
flow matching multimodal |
|
|
| 4 |
DDO-RM for LLM Preference Optimization: A Minimal Held-Out Benchmark against DPO |
DDO-RM:一种针对LLM偏好优化的极简留出基准,对比DPO |
DPO direct preference optimization |
|
|
| 5 |
DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation |
DIB-OD:通过解耦信息瓶颈和在线蒸馏实现异构图鲁棒自适应 |
teacher-student distillation |
|
|
| 6 |
Autonomous Diffractometry Enabled by Visual Reinforcement Learning |
提出基于视觉强化学习的自主衍射系统,无需晶体学知识即可实现晶体自动对准。 |
reinforcement learning |
|
|
| 7 |
Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental Learning |
提出量子门控任务交互知识蒸馏框架,解决预训练模型在类增量学习中的灾难性遗忘问题。 |
distillation |
|
|
| 8 |
Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems |
提出热力学液流形网络,用于解决离网系统中可靠的太阳辐射预测问题。 |
state space model |
|
|
| 9 |
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration |
提出NExt框架,通过非线性外推低秩轨迹加速LLM的RLVR训练。 |
reinforcement learning large language model |
✅ |
|
| 10 |
Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy Analysis |
提出熵感知策略优化EAPO,解决RLVR中token级别信用分配问题 |
reinforcement learning large language model |
|
|