| 1 |
Prior-Aligned Data Cleaning for Tabular Foundation Models |
提出L2C2框架,通过强化学习进行先验对齐的数据清洗,提升表格基础模型性能。 |
reinforcement learning reward design foundation model |
|
|
| 2 |
DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control |
提出DGLight以优化交通信号控制中的大语言模型 |
reinforcement learning large language model |
✅ |
|
| 3 |
Conditional Flow Matching for Probabilistic Downscaling of Maximum 3-day Snowfall in Alaska |
提出WxFlow,基于条件流匹配实现阿拉斯加最大3日降雪概率降尺度,提升光谱保真度。 |
flow matching physically plausible |
✅ |
|
| 4 |
Diverse Image Priors for Black-box Data-free Knowledge Distillation |
提出DIP-KD,解决黑盒无数据知识蒸馏中数据多样性不足的问题 |
contrastive learning distillation |
|
|
| 5 |
Biased Dreams: Limitations to Epistemic Uncertainty Quantification in Latent Space Models |
揭示潜在空间模型中认知不确定性量化的局限性:存在偏差的“梦境” |
reinforcement learning dreamer latent dynamics |
|
|
| 6 |
Zero Shot Coordination for Sparse Reward Tasks with Diverse Reward Shapings |
提出基于随机奖励塑造集成的零样本协作方法,解决稀疏奖励任务中的合作问题。 |
reinforcement learning reward shaping |
|
|
| 7 |
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient |
针对策略梯度,提出一种考虑有益误差的不完美奖励分类方法,应用于语言模型训练。 |
reinforcement learning RLHF reward design |
|
|
| 8 |
Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment |
研究表明持续梯度对齐驱动MNIST辅助Logit蒸馏中的潜意识学习 |
distillation |
|
|
| 9 |
Dyna-Style Safety Augmented Reinforcement Learning: Staying Safe in the Face of Uncertainty |
提出Dyna-SAuR算法,通过学习动态模型和安全滤波器提升强化学习安全性 |
reinforcement learning |
|
|
| 10 |
Knowledge Distillation Must Account for What It Loses |
知识蒸馏需考虑信息损失,关注模型能力可靠性 |
distillation |
|
|
| 11 |
Elite-Driven Support Vector Machines for Classification |
提出Elite-Driven SVM,通过精英样本引导提升分类性能并融合先验知识。 |
teacher-student distillation |
|
|