| 1 |
ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization |
ISEP:基于随机策略优化的离线强化学习隐式支持扩展 |
reinforcement learning offline reinforcement learning flow matching |
|
|
| 2 |
AURORA: Contextual Orthogonalization for Geometric Representation Learning in Healthcare Foundation Models |
AURORA:面向医疗健康领域,通过上下文正交化实现几何表征学习 |
representation learning distillation foundation model |
|
|
| 3 |
PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics |
提出基于Port-Hamiltonian生成动态的物理驱动世界模型PH-Dreamer,提升控制任务性能。 |
world model world models dreamer |
|
|
| 4 |
Foundation Models for Credit Risk Prediction: A Game Changer? |
利用预训练tabular foundation模型提升信用风险预测,尤其适用于小样本场景。 |
predictive model large language model foundation model |
|
|
| 5 |
Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees |
提出一种知识蒸馏方法,将表格领域预训练模型压缩为CPU可用的梯度提升树,实现推理加速。 |
teacher-student distillation foundation model |
|
|
| 6 |
UTOPYA: A Multimodal Deep Learning Framework for Physics-Informed Anomaly Detection and Time-Series Prediction |
UTOPYA:用于物理信息异常检测和时间序列预测的多模态深度学习框架 |
curriculum learning distillation multimodal |
|
|
| 7 |
Distilling Tabular Foundation Models for Structured Health Data |
提出面向结构化健康数据的表格基础模型蒸馏方法,实现轻量化部署。 |
distillation foundation model |
|
|
| 8 |
KairosHope: A Next-Generation Time-Series Foundation Model for Specialized Classification via Dual-Memory Architecture |
KairosHope:基于双记忆架构的下一代时间序列基础模型,用于专业分类 |
contrastive learning foundation model |
|
|
| 9 |
TabH2O: A Unified Foundation Model for Tabular Prediction |
TabH2O:用于表格预测的统一基础模型,通过单次前向传播实现分类和回归。 |
curriculum learning foundation model |
|
|
| 10 |
Heterogeneous Tasks Offloading in Vehicular Edge Computing: A Federated Meta Deep Reinforcement Learning Approach |
提出FedMAGS框架,解决车载边缘计算中异构任务卸载的隐私保护与快速适应问题 |
reinforcement learning deep reinforcement learning |
|
|
| 11 |
General Preference Reinforcement Learning |
提出通用偏好强化学习(GPRL),解决LLM开放域任务中奖励函数设计难题。 |
reinforcement learning large language model |
|
|
| 12 |
Post-Trained MoE Can Skip Half Experts via Self-Distillation |
ZEDA:通过自蒸馏使后训练MoE模型跳过半数专家,提升推理效率 |
distillation instruction following |
|
|
| 13 |
Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework |
提出ProRL:一种可解释的程序化强化学习框架,用于解决作业车间调度问题。 |
reinforcement learning deep reinforcement learning DRL |
✅ |
|
| 14 |
Beyond Inference-Time Search: Reinforcement Learning Synthesizes Reusable Solvers |
利用强化学习合成可复用求解器,提升LLM在组合优化问题上的效率 |
reinforcement learning large language model |
|
|
| 15 |
FedSDR: Federated Self-Distillation with Rectification |
FedSDR:联邦自蒸馏与校正,解决联邦微调大语言模型的异构性问题 |
distillation large language model |
|
|
| 16 |
Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning |
提出CodeThinker,通过一致性强化学习提升LLM的代码推理能力 |
reinforcement learning large language model |
|
|
| 17 |
$\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control |
提出f-OPD框架,通过新鲜度感知控制稳定长程On-Policy蒸馏训练。 |
distillation large language model |
|
|
| 18 |
Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights |
提出基于强化学习的顾客轨迹建模方法,优化零售布局。 |
reinforcement learning PULSE |
|
|
| 19 |
AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning |
AMARIS:一种基于记忆增强的评分规则改进系统,用于基于评分规则的强化学习 |
reinforcement learning reward shaping |
|
|
| 20 |
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents |
HINT-SD:面向长程Agent的靶向后见之明自蒸馏方法 |
reinforcement learning distillation |
|
|
| 21 |
Federated Martingale Posterior Samping |
提出联邦Martingale后验采样(FMP),解决联邦贝叶斯神经网络中先验难以确定问题。 |
predictive model large language model |
|
|
| 22 |
Graph Hierarchical Recurrence for Long-Range Generalization |
提出图分层递归(GHR)框架,解决图神经网络长程泛化问题。 |
representation learning foundation model |
|
|
| 23 |
Alignment Dynamics in LLM Fine-Tuning |
提出对齐动力学框架,解释并预测LLM微调中的对齐脆弱性与恢复现象 |
reinforcement learning large language model |
|
|
| 24 |
Privacy Preserving Reinforcement Learning with One-Sided Feedback |
提出POOL算法,解决单侧反馈多维连续状态动作空间下的隐私保护强化学习问题 |
reinforcement learning |
|
|
| 25 |
Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning |
提出交互破坏对抗学习框架,提升多智能体强化学习的鲁棒性 |
reinforcement learning |
|
|
| 26 |
Multi-site PPG: An In-the-Wild Physiological Dataset from Emerging Multi-site Wearables |
提出Multi-site PPG多位点生理数据集,用于评估新兴可穿戴设备在真实环境下的心率监测性能。 |
MAE TAMP |
|
|
| 27 |
Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization |
提出BiKD,通过双层优化平衡知识蒸馏中的样本级损失权重,解决不平衡学习问题。 |
distillation |
|
|
| 28 |
Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics |
提出基于知识蒸馏的Agentic Cost-Aware查询规划器,优化大数据分析中的资源受限查询。 |
distillation |
✅ |
|
| 29 |
Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models |
提出自蒸馏方法以优化尖峰协方差模型中的谱收缩估计器 |
distillation |
|
|
| 30 |
COOPO: Cyclic Offline-Online Policy Optimization Algorithm |
提出COOPO算法,通过循环离线-在线策略优化,提升强化学习的样本效率和性能。 |
reinforcement learning offline reinforcement learning |
|
|
| 31 |
DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization |
DiPRL:通过架构熵正则化学习离散程序化策略,提升强化学习任务性能 |
reinforcement learning deep reinforcement learning |
|
|