| 1 |
Offline Reinforcement Learning with Generative Trajectory Policies |
提出生成轨迹策略(GTP),提升离线强化学习中生成模型的性能与效率。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 2 |
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding |
ReLook:利用多模态LLM进行视觉引导的强化学习,用于Agentic Web Coding |
reinforcement learning large language model multimodal |
|
|
| 3 |
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models |
提出边界引导策略优化(BGPO),解决扩散大语言模型RL训练中的内存瓶颈问题 |
reinforcement learning large language model |
✅ |
|
| 4 |
How Reinforcement Learning After Next-Token Prediction Facilitates Learning |
提出强化学习后接续预测框架,提升LLM在推理任务中的泛化能力 |
reinforcement learning large language model chain-of-thought |
|
|
| 5 |
Vision-LLMs for Spatiotemporal Traffic Forecasting |
提出ST-Vision-LLM,将时空交通预测转化为视觉-语言融合问题,提升预测精度。 |
reinforcement learning spatiotemporal large language model |
|
|
| 6 |
PhysioME: A Robust Multimodal Self-Supervised Framework for Physiological Signals with Missing Modalities |
PhysioME:针对生理信号缺失模态的鲁棒多模态自监督学习框架 |
contrastive learning multimodal |
|
|
| 7 |
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs |
QeRL:面向LLM的量化增强强化学习框架,提升效率并增强探索能力 |
reinforcement learning large language model |
|
|
| 8 |
ADARL: Adaptive Low-Rank Structures for Robust Policy Learning under Uncertainty |
提出自适应低秩结构(AdaRL),用于不确定性下的鲁棒策略学习。 |
reinforcement learning policy learning SAC |
|
|
| 9 |
Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling |
提出CART,增强Decision Transformer在对抗随机博弈中的鲁棒性。 |
reinforcement learning decision transformer transformer policy |
|
|
| 10 |
Query-Specific GNN: A Comprehensive Graph Representation Learning Method for Retrieval Augmented Generation |
提出查询特定图神经网络(QSGNN)用于增强检索生成中多跳问题的知识检索。 |
representation learning large language model |
✅ |
|
| 11 |
AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution |
提出AMiD,利用α混合辅助分布进行LLM知识蒸馏,提升性能和训练稳定性。 |
distillation large language model |
|
|
| 12 |
Reinforcement Learning for Tool-Integrated Interleaved Thinking towards Cross-Domain Generalization |
提出RITE方法以解决跨领域工具增强强化学习的泛化问题 |
reinforcement learning large language model |
|
|
| 13 |
Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning |
提出Cog-Rethinker以解决LLM推理中的样本利用效率问题 |
reinforcement learning large language model |
|
|
| 14 |
Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM |
提出RFTHGS框架,通过强化学习微调小型LLM,为CVRP的HGS求解器生成高性能交叉算子。 |
reinforcement learning large language model |
|
|
| 15 |
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs |
提出AT-GRPO算法,解决多智能体LLM协作中的策略优化难题 |
reinforcement learning large language model |
✅ |
|
| 16 |
Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning |
提出自适应熵正则化AER,解决LLM强化学习中策略熵崩溃问题 |
reinforcement learning large language model |
|
|
| 17 |
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving |
提出GAR:生成对抗强化学习框架,用于形式化定理证明,提升训练效率和性能。 |
reinforcement learning curriculum learning |
|
|
| 18 |
Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning |
针对非平稳强化学习,提出高效重启策略以提升动态后悔值 |
reinforcement learning |
|
|
| 19 |
MEET-Sepsis: Multi-Endogenous-View Enhanced Time-Series Representation Learning for Early Sepsis Prediction |
MEET-Sepsis:用于早期脓毒症预测的多内生视图增强时间序列表示学习 |
representation learning |
✅ |
|
| 20 |
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony |
ROLL Flash:通过异步化加速RLVR和Agentic任务中的强化学习训练 |
reinforcement learning large language model |
|
|
| 21 |
Emergence of hybrid computational dynamics through reinforcement learning |
强化学习驱动循环神经网络涌现混合计算动力学,提升决策任务性能 |
reinforcement learning |
|
|
| 22 |
Robust Photoplethysmography Signal Denoising via Mamba Networks |
提出基于Mamba网络的DPNet,用于鲁棒的光电容积脉搏波信号去噪,提升可穿戴设备心率估计精度。 |
Mamba |
|
|
| 23 |
Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation |
提出PerSyn:通过路由引导的多教师蒸馏实现个性化数据合成,提升学生模型性能。 |
distillation |
|
|
| 24 |
Don't Walk the Line: Boundary Guidance for Filtered Generation |
提出边界引导方法,提升生成模型安全性与效用性,避免生成结果落入分类器决策边界附近。 |
reinforcement learning reward design |
|
|