| 12 |
World Model for AI Autonomous Navigation in Mechanical Thrombectomy |
提出基于世界模型的TD-MPC2算法,提升机械取栓术中AI自主导航性能 |
reinforcement learning SAC world model |
|
|
| 13 |
MDD-Thinker: Towards Large Reasoning Models for Major Depressive Disorder Diagnosis |
MDD-Thinker:面向重度抑郁症诊断的推理增强大语言模型 |
reinforcement learning large language model multimodal |
|
|
| 14 |
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training |
元-Bandit LLM训练中涌现的贪婪利用偏差研究 |
reinforcement learning reward design large language model |
|
|
| 15 |
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends |
揭示GRPO的Off-Policy本质:为LLM的Off-Policy强化学习提供理论基础与算法指导 |
reinforcement learning large language model |
✅ |
|
| 16 |
Safe In-Context Reinforcement Learning |
提出安全上下文强化学习方法,解决无参数更新适应过程中的安全约束问题 |
reinforcement learning |
|
|
| 17 |
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression |
SIRI:通过交错压缩迭代强化学习,提升大型推理模型的效率与准确性。 |
reinforcement learning |
|
|
| 18 |
ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation |
提出ORPO-Distill,通过混合策略偏好优化实现跨架构LLM蒸馏 |
distillation |
|
|
| 19 |
Safe Reinforcement Learning-Based Vibration Control: Overcoming Training Risks with LQR Guidance |
提出基于LQR引导的安全强化学习振动控制方法,解决训练过程中的安全风险。 |
reinforcement learning |
|
|
| 20 |
Machine Learning Algorithms for Improving Black Box Optimization Solvers |
综述:机器学习算法提升黑盒优化求解器性能 |
reinforcement learning Mamba |
|
|
| 21 |
LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection |
LEAF:一种鲁棒的基于专家模型的少样本持续事件检测框架 |
contrastive learning distillation |
|
|