| 37 |
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance |
FEST:基于少量样本引导的可验证奖励强化学习,提升样本效率 |
reinforcement learning large language model chain-of-thought |
|
|
| 38 |
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy |
ActFocus:通过Token级能量分析解决Agentic强化学习中的动作瓶颈问题 |
reinforcement learning PPO large language model |
|
|
| 39 |
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models |
DiffusionOPD:扩散模型中基于在线策略蒸馏的多任务统一框架 |
reinforcement learning PPO distillation |
|
|
| 40 |
Self-Distilled Agentic Reinforcement Learning |
提出SDAR,通过自蒸馏提升LLM Agent在复杂交互任务中的强化学习效果 |
reinforcement learning distillation |
|
|
| 41 |
Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions |
提出认知不确定性引导的知识蒸馏框架,用于提升学生错误概念分类的准确性。 |
distillation |
✅ |
|
| 42 |
PreFT: Prefill-only finetuning for efficient inference |
提出PreFT以解决多适配器服务效率问题 |
reinforcement learning large language model |
|
|
| 43 |
Not All Symbols Are Equal: Importance-Aware Constellation Design for Semantic Communication |
提出语义重要性感知的星座图设计,提升语义通信系统在信道干扰下的鲁棒性。 |
reinforcement learning deep reinforcement learning |
|
|
| 44 |
Toward World Modeling of Physiological Signals with Chaos-Theoretic Balancing and Latent Dynamics |
NormWear-2:利用混沌理论平衡和潜在动态建模生理信号,实现多尺度预测。 |
world model world models latent dynamics |
|
|
| 45 |
DRL-STAF: A Deep Reinforcement Learning Framework for State-Aware Forecasting of Complex Multivariate Hidden Markov Processes |
DRL-STAF:用于复杂多元隐马尔可夫过程状态感知预测的深度强化学习框架 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 46 |
Controllable Molecular Generative Foundation Models |
CoMole:可控分子生成基础模型,用于异构设计任务。 |
reinforcement learning MAE foundation model |
|
|
| 47 |
Peng's Q($λ$) for Conservative Value Estimation in Offline Reinforcement Learning |
提出保守Peng's Q($λ$) (CPQL)算法,用于离线强化学习中的保守价值估计 |
reinforcement learning offline RL offline reinforcement learning |
✅ |
|
| 48 |
Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning |
研究塑性干预对深度强化学习后门攻击的影响,提出SCC框架和检测指标。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 49 |
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero |
GRLO:探索从零开始在开放环境中实现通用强化学习 |
reinforcement learning RLHF large language model |
✅ |
|
| 50 |
Fast Rates for Inverse Reinforcement Learning |
提出熵正则化的最小-最大逆强化学习以加速学习速率 |
reinforcement learning inverse reinforcement learning |
|
|
| 51 |
Representation Without Reward: A JEPA Audit for LLM Fine-Tuning |
通过JEPA审计评估LLM微调效果:表征与奖励的解耦研究 |
JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture |
|
|
| 52 |
Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement |
Crys-JEPA:通过嵌入筛选和生成细化加速晶体发现 |
JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture |
|
|
| 53 |
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment |
提出BBCritic,将GUI评判重构为连续语义对齐问题,显著提升GUI智能体的泛化能力。 |
contrastive learning affordance zero-shot transfer |
|
|
| 54 |
Time-Varying Deep State Space Models for Sequences with Switching Dynamics |
提出时变深度状态空间模型,用于处理具有切换动态的序列建模问题。 |
SSM state space model |
|
|
| 55 |
Learning from Language Feedback via Variational Policy Distillation |
提出变分策略蒸馏(VPD)框架,解决语言反馈强化学习中教师策略停滞问题。 |
reinforcement learning distillation |
|
|
| 56 |
AudioMosaic: Contrastive Masked Audio Representation Learning |
AudioMosaic:基于对比学习和掩码的音频表征学习方法 |
representation learning contrastive learning |
✅ |
|
| 57 |
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients |
提出混合策略优化(HPO)算法,解决混合离散-连续动作空间中的强化学习问题。 |
reinforcement learning PPO differentiable simulation |
|
|
| 58 |
Lagrangian Flow Matching: A Least-Action Framework for Principled Path Design |
Lagrangian Flow Matching:基于最小作用量原理的概率路径设计 |
flow matching |
|
|
| 59 |
Training on Documents About Monitoring Leads to CoT Obfuscation |
研究表明,模型通过学习监控文档可混淆CoT推理过程,逃避检测。 |
reinforcement learning chain-of-thought |
|
|
| 60 |
Curriculum Learning of Physics-Informed Neural Networks based on Spatial Correlation |
提出基于空间相关的课程学习PINN框架,提升偏微分方程求解精度。 |
curriculum learning |
✅ |
|
| 61 |
ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization |
提出ROAD框架以解决离线到在线强化学习中的数据混合问题 |
reinforcement learning |
|
|
| 62 |
MoRe: Modular Representations for Principled Continual Representation Learning on Squantial Data |
MoRe:通过模块化表示实现序列数据上的持续表示学习 |
representation learning |
|
|
| 63 |
Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation |
提出OPSA:通过在线自蒸馏减少LLM安全对齐中的安全税。 |
distillation |
|
|
| 64 |
Not All Timesteps Matter Equally: Selective Alignment Knowledge Distillation for Spiking Neural Networks |
提出选择性对齐知识蒸馏(SeAl-KD)方法,提升脉冲神经网络(SNN)性能。 |
distillation |
✅ |
|
| 65 |
Quantum Advantage in Multi Agent Reinforcement Learning |
基于量子纠缠的多智能体强化学习框架,实现超越经典极限的智能体协作 |
reinforcement learning |
|
|