| 1 |
medR: Reward Engineering for Clinical Offline Reinforcement Learning via Tri-Drive Potential Functions |
medR:通过三驱动势函数进行临床离线强化学习的奖励工程 |
reinforcement learning policy learning offline reinforcement learning |
|
|
| 2 |
Reinforcement Learning with Promising Tokens for Large Language Models |
提出RLPT框架,通过有希望的tokens进行强化学习,提升LLM推理能力。 |
reinforcement learning large language model |
|
|
| 3 |
Entropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language Models |
提出熵门控选择性策略优化EGSPO,用于大语言模型混合训练中的token级梯度分配。 |
reinforcement learning PPO large language model |
|
|
| 4 |
Robust Representation Learning in Masked Autoencoders |
研究表明掩码自编码器(MAE)学习的表征具有很强的鲁棒性,尤其是在图像分类任务中。 |
representation learning masked autoencoder MAE |
|
|
| 5 |
Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation |
提出Cobalt方法,结合在线与离线强化学习,提升多轮代码生成性能。 |
reinforcement learning offline RL large language model |
✅ |
|
| 6 |
CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering |
提出CoCoEmo,通过激活调控实现可组合、可控的类人情感TTS |
flow matching motion synthesis |
|
|
| 7 |
Self-Hinting Language Models Enhance Reinforcement Learning |
提出自提示对齐GRPO以解决稀疏奖励问题 |
reinforcement learning privileged information large language model |
✅ |
|
| 8 |
Antidistillation Fingerprinting |
提出反蒸馏指纹(ADFP)方法,提升模型溯源能力并降低对模型效用的影响。 |
distillation large language model |
|
|
| 9 |
CoGenCast: A Coupled Autoregressive-Flow Generative Framework for Time Series Forecasting |
CoGenCast:耦合自回归-Flow生成模型用于时间序列预测 |
flow matching large language model multimodal |
✅ |
|
| 10 |
Not All Negative Samples Are Equal: LLMs Learn Better from Plausible Reasoning |
提出PNS方法,通过高质量负样本提升LLM的推理能力 |
reinforcement learning large language model chain-of-thought |
|
|
| 11 |
An Approximate Ascent Approach To Prove Convergence of PPO |
提出近似上升方法,证明PPO收敛性,并解决优势函数估计问题 |
reinforcement learning deep reinforcement learning PPO |
|
|
| 12 |
TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT |
提出轨迹混合监督(TMS),解决SFT中策略漂移导致的灾难性遗忘问题。 |
reinforcement learning large language model instruction following |
|
|
| 13 |
Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL |
PULSE:利用权重更新稀疏性,实现通信高效的分布式强化学习 |
reinforcement learning PULSE large language model |
|
|
| 14 |
StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret Modeling |
提出基于心理后悔模型的强化学习加速方法,解决稀疏奖励下的收敛难题。 |
reinforcement learning PPO |
|
|
| 15 |
Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning |
提出神经预测-校正器(NPC),用强化学习解决同伦问题。 |
reinforcement learning |
|
|
| 16 |
SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones |
提出SAFE-KD,通过风险控制的提前退出蒸馏提升视觉骨干网络效率。 |
distillation |
|
|
| 17 |
Preference-based Conditional Treatment Effects and Policy Learning |
提出基于偏好的条件处理效应框架,用于异质性效应建模和策略学习 |
policy learning |
|
|
| 18 |
Efficient Estimation of Kernel Surrogate Models for Task Attribution |
提出基于核函数的代理模型,高效评估训练任务对目标任务的影响。 |
reinforcement learning large language model |
|
|
| 19 |
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL |
提出推理缓存(RC)算法,通过短程强化学习实现LLM在长程推理上的持续改进。 |
reinforcement learning large language model |
|
|
| 20 |
Conditional Flow Matching for Visually-Guided Acoustic Highlighting |
提出基于条件流匹配的视觉引导声学增强方法,解决音频重混中的歧义性问题。 |
flow matching |
|
|
| 21 |
ContraLog: Log File Anomaly Detection with Contrastive Learning and Masked Language Modeling |
ContraLog:基于对比学习和掩码语言模型的日志文件异常检测方法 |
contrastive learning |
|
|
| 22 |
Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG |
提出基于强化学习的RAG历史感知稠密检索微调方法,提升多跳推理性能 |
reinforcement learning large language model |
|
|
| 23 |
Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing |
提出基于罕见事件增强和双向配对的Prompt高效RLVR方法,提升大语言模型在确定性推理任务上的性能。 |
reinforcement learning large language model |
|
|
| 24 |
Information-Theoretic Multi-Model Fusion for Target-Oriented Adaptive Sampling in Materials Design |
提出信息论多模型融合的自适应采样方法,用于材料设计中的目标导向发现。 |
distillation multimodal |
|
|
| 25 |
From Scalar Rewards to Potential Trends: Shaping Potential Landscapes for Model-Based Reinforcement Learning |
SLOPE:通过塑造潜在趋势,解决基于模型的强化学习在稀疏奖励环境下的难题 |
reinforcement learning |
|
|
| 26 |
Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning |
提出Prompt Augmentation,稳定扩展GRPO在数学推理上的训练,显著提升模型性能。 |
reinforcement learning large language model |
✅ |
|
| 27 |
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost |
提出量化进化策略QES,实现低精度代价下量化LLM的高精度微调 |
reinforcement learning large language model |
✅ |
|
| 28 |
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs |
CoBA-RL:面向大语言模型能力自适应的强化学习预算分配 |
reinforcement learning |
|
|
| 29 |
Asymmetric Hierarchical Anchoring for Audio-Visual Joint Representation: Resolving Information Allocation Ambiguity for Robust Cross-Modal Generalization |
提出非对称分层锚定(AHA)方法,解决跨模态泛化中的信息分配歧义问题。 |
representation learning distillation |
|
|