| 1 |
PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning |
PCL-Reasoner-V1.5:利用离线强化学习提升数学推理能力 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 2 |
A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem |
提出基于课程学习的深度强化学习框架,解决电动汽车路径规划问题。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
Multimodal Rumor Detection Enhanced by External Evidence and Forgery Features |
提出融合外部证据与伪造特征的多模态谣言检测模型,提升社交媒体谣言识别精度。 |
contrastive learning multimodal |
|
|
| 4 |
CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning |
CLEANER:自净化轨迹提升Agentic强化学习性能 |
reinforcement learning large language model |
|
|
| 5 |
Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control |
FluidGym:首个完全可微的强化学习流体控制基准测试平台 |
reinforcement learning PPO SAC |
✅ |
|
| 6 |
CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation |
CoScale-RL:通过协同缩放数据和计算,高效地进行大模型后训练。 |
reinforcement learning distillation foundation model |
|
|
| 7 |
Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data |
基于结果的强化学习能使Transformer推理,但需合适数据 |
reinforcement learning chain-of-thought |
|
|
| 8 |
Memory Retention Is Not Enough to Master Memory Tasks in Reinforcement Learning |
提出记忆重写基准测试,揭示现有强化学习记忆模型的局限性 |
reinforcement learning |
✅ |
|
| 9 |
Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning |
提出EGRL-SR框架以解决符号回归中的搜索效率问题 |
reinforcement learning |
|
|
| 10 |
What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study |
针对推理LLM,提出一种高效的低比特量化感知训练方法Reasoning-QAT。 |
reinforcement learning distillation |
|
|