| 1 |
ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data |
提出ChronoMedicalWorld模型以解决长期临床数据中的患者轨迹预测问题 |
world model world models MAE |
|
|
| 2 |
Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning |
提出目标对齐的贝尔曼备份(TABB)方法,解决跨域离线强化学习中的数据迁移问题。 |
reinforcement learning policy learning offline RL |
|
|
| 3 |
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles |
Maestro:强化学习驱动的分层模型-技能集成框架,提升多模态任务性能 |
reinforcement learning large language model multimodal |
✅ |
|
| 4 |
Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation |
基于状态分布视角分析SFT、RL和On-Policy蒸馏的后训练方法 |
reinforcement learning distillation large language model |
|
|
| 5 |
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning |
SCRL:基于子问题课程学习的强化学习,提升LLM推理能力并解决信用分配问题 |
reinforcement learning curriculum learning IMoS |
|
|
| 6 |
Asymmetric Virtual Memory Paging for Hybrid Mamba-Transformer Inference |
提出非对称虚拟内存分页AVMP,优化混合Mamba-Transformer模型推理的内存管理。 |
Mamba SSM state space model |
|
|
| 7 |
The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning |
提出匹配原则,通过估计扰动协方差正则化编码器,实现表征学习的鲁棒性。 |
DPO representation learning |
|
|
| 8 |
From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching |
提出scFM,通过条件流匹配学习单细胞基因表达动态,解决时间序列数据缺失问题。 |
flow matching latent dynamics |
|
|
| 9 |
Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks |
提出基于切比雪夫多项式的强化学习策略,显著提升低维控制任务性能 |
reinforcement learning PPO |
|
|
| 10 |
Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning |
提出方向自适应自蒸馏(DASD),提升LLM在数学推理中的探索能力与准确性 |
distillation privileged information |
|
|
| 11 |
Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration |
提出基于情景上下文和持久世界的3D探索方法,解决好奇心驱动探索中的局部循环问题。 |
reinforcement learning predictive model 3D reconstruction |
✅ |
|
| 12 |
MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data |
MambaGaze:利用双向Mamba和显式缺失数据建模进行认知负荷评估 |
Mamba |
|
|
| 13 |
The Distillation Game: Adaptive Attacks & Efficient Defenses |
提出基于对抗博弈的蒸馏攻击与防御框架,并设计高效防御方法PoE。 |
distillation |
✅ |
|
| 14 |
Abstraction for Offline Goal-Conditioned Reinforcement Learning |
针对离线目标条件强化学习,提出基于相对化选项和层级抽象的框架 |
reinforcement learning |
|
|
| 15 |
Reinforcement learning for ion shuttling on trapped-ion quantum computers |
提出基于强化学习的离子穿梭优化方法,提升囚禁离子量子计算机的运算效率。 |
reinforcement learning |
|
|
| 16 |
Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning |
提出Qreg+NWLU以解决多循环持续强化学习中的遗忘问题 |
reinforcement learning |
|
|
| 17 |
Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs |
提出RGoT:利用强化学习自适应生成LLM的思维图,提升复杂问题求解能力 |
reinforcement learning large language model |
|
|
| 18 |
One-Way Policy Optimization for Self-Evolving LLMs |
提出单向策略优化以解决大语言模型训练不稳定问题 |
reinforcement learning large language model |
|
|
| 19 |
Toward Understanding Adversarial Distillation: Why Robust Teachers Fail |
揭示对抗蒸馏中鲁棒教师失效的原因:鲁棒不可学习集上的不一致性 |
distillation |
|
|
| 20 |
PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference |
PhylaFlow:在BHV树空间中利用混合流匹配进行系统发育推断 |
flow matching |
|
|
| 21 |
Hybrid Kolmogorov-Arnold Network and XGBoost Framework for Week-Ahead Price Forecasting in Australia's National Electricity Market |
提出KAN+XGBoost混合框架,用于澳大利亚电力市场中长期电力价格预测。 |
MAE penetration |
|
|
| 22 |
Survive or Collapse: The Asymmetric Roles of Data Gating and Reward Grounding in Self-Play RL |
揭示自博弈强化学习中数据门控与奖励函数的不对称性,强调数据门控对稳定性的关键作用。 |
reinforcement learning reward design |
|
|
| 23 |
OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning |
OPPO:基于贝叶斯值递归的LLM推理中Token级信用分配方法 |
reinforcement learning distillation |
|
|