| 1 |
Momentum Based Reward Design for Low Emission Traffic Signal Control |
提出基于动量的奖励函数,用于优化低排放交通信号控制 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 2 |
Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning |
提出Hista和Numca,有效提升LLM强化学习中的状态价值估计 |
reinforcement learning PPO large language model |
|
|
| 3 |
PEARL: Training Socratic Tutors with Pedagogically Aligned Reinforcement Learning |
PEARL:基于教学对齐强化学习训练苏格拉底式辅导模型 |
reinforcement learning large language model |
|
|
| 4 |
How's it going? Reinforcement learning in language models recruits a functional welfare axis |
强化学习在语言模型中激活功能性福利轴,影响模型行为 |
reinforcement learning |
|
|
| 5 |
MIC: Maximizing Informational Capacity in Adaptive Representations via Isotropic Subspace Alignment |
提出MIC框架,通过各向同性子空间对齐最大化自适应表征的信息容量,尤其在高压缩场景下。 |
representation learning distillation |
|
|
| 6 |
Rethinking Post-Training Recipes for Multimodal Time-Series Forecasting |
提出PostTime,通过后训练LLM修正数值时间序列预测,实现多模态时间序列预测。 |
reinforcement learning foundation model multimodal |
|
|
| 7 |
TRACER: Persistent Regularization for Robust Multimodal Finetuning |
提出TRACER,通过持续正则化提升多模态微调的鲁棒性和泛化能力 |
contrastive learning distillation multimodal |
✅ |
|
| 8 |
GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models |
提出GDSD:通过引导去噪器自蒸馏进行扩散语言模型的强化学习 |
reinforcement learning distillation large language model |
✅ |
|
| 9 |
Chess-World-Model: A 10M-Game Benchmark for Exact State Tracking from Chess Move Sequences |
提出Chess-World-Model:一个基于1000万棋局的精确状态追踪基准 |
world model world models Mamba |
|
|
| 10 |
LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation |
LoopFM:利用历史表征,提升推荐系统中垂直模型对基础模型的知识迁移效率 |
distillation foundation model |
|
|
| 11 |
Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification |
提出基于小波变换和谱流匹配的DSFM模型,用于生成fMRI时间序列并识别脑部疾病。 |
flow matching spatiotemporal |
✅ |
|
| 12 |
Learning to Perceive the World Through Control: Empowerment-Based Representation Learning |
提出基于控制的表征学习方法,通过最大化控制能力提取控制相关特征。 |
reinforcement learning representation learning |
|
|
| 13 |
Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics |
提出基于状态增强和共识机制的可扩展约束多智能体强化学习方法,解决可分离动态系统中的资源约束问题。 |
reinforcement learning policy learning |
|
|
| 14 |
Bounded Behavioral Indistinguishability for Black-Box LLM Distillation |
提出有界行为不可区分性以改进黑箱LLM蒸馏评估 |
teacher-student distillation |
|
|
| 15 |
Calibrated Preference Learning: The Case of Label Ranking |
针对标签排序任务,论文提出校准学习框架,提升排序预测的可靠性。 |
preference learning RLHF |
|
|
| 16 |
In-Context Reward Adaptation for Robust Preference Modeling |
提出In-Context Reward Adaptation框架,解决RLHF中奖励模型泛化性问题 |
reinforcement learning RLHF large language model |
|
|
| 17 |
ESPO: Early-Stopping Proximal Policy Optimization |
ESPO:提前停止近端策略优化,加速LLM强化学习并提升数学推理能力 |
reinforcement learning PPO large language model |
|
|
| 18 |
Information-Directed Offline-to-Online Reinforcement Learning |
提出信息导向的离线到在线强化学习方法以解决探索问题 |
reinforcement learning offline RL |
|
|
| 19 |
LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation |
提出LARK以解决推理蒸馏中的轨迹选择问题 |
distillation |
✅ |
|
| 20 |
Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets |
提出一种基于统计嵌入的表格数据相似性、检索和可解释对齐方法 |
predictive model large language model |
|
|
| 21 |
On Distributional Reinforcement Learning in Chaotic Dynamical Systems |
提出基于Wasserstein距离的分布强化学习,解决混沌动力系统中的高方差问题 |
reinforcement learning |
|
|
| 22 |
RL2ML: Finite-Rollout Surrogate Objectives from Reinforcement Learning to Maximum Likelihood |
提出RL2ML,连接强化学习与最大似然,优化语言模型训练。 |
reinforcement learning |
|
|
| 23 |
Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies |
提出AWD正则化方法,解决ES微调LLM中的遗忘问题,提升持续学习能力 |
reinforcement learning large language model |
|
|
| 24 |
Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption |
提出SW-DRSO框架,增强集合表征学习在推理时元素损坏下的鲁棒性 |
representation learning |
|
|
| 25 |
A Predictive Law for On-Policy Self-Distillation From World Feedback |
提出一种预测性规律,用于从世界反馈中进行On-Policy自蒸馏,提升后训练效率。 |
distillation |
|
|
| 26 |
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training |
提出LaRA框架,通过层级表征分析检测RL后训练中LLM的数据污染问题 |
reinforcement learning large language model |
|
|
| 27 |
Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets |
提出异构带宽预算下的联邦探测-逻辑蒸馏优化分配方法 |
distillation |
|
|
| 28 |
On-Policy Replay for Continual Supervised Fine-Tuning |
提出On-Policy Replay方法,解决持续监督微调中大语言模型的灾难性遗忘问题。 |
distillation large language model |
✅ |
|
| 29 |
When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer |
通过强化学习和新颖性奖励,提升LLM在谜题到数学的跨领域推理能力 |
reinforcement learning chain-of-thought |
|
|
| 30 |
Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents |
提出MF-Diffuser以解决多智能体离线强化学习的扩展问题 |
reinforcement learning offline reinforcement learning |
|
|