| 1 |
AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models |
AcceRL:面向VLA模型的分布式异步强化学习与世界模型框架 |
reinforcement learning world model vision-language-action |
|
|
| 2 |
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach |
RE-SAC:解耦随机与认知不确定性,实现稳定鲁棒的公交车队控制 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain Distillation |
STEP:通过跨领域蒸馏预训练科学时间序列编码器,提升表征学习效果 |
representation learning distillation foundation model |
|
|
| 4 |
HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning |
提出HISR,利用后见信息调制的片段过程奖励,提升多轮Agent强化学习性能 |
reinforcement learning large language model |
|
|
| 5 |
Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards |
提出Discounted Beta--Bernoulli奖励估计,提升可验证奖励强化学习的样本效率 |
reinforcement learning large language model |
|
|
| 6 |
Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control |
提出自编码器门控双节点Transformer与强化学习控制的自适应股票价格预测框架 |
reinforcement learning SAC |
|
|
| 7 |
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks |
CausalRM:利用因果理论进行奖励建模,从观测用户反馈中进行RLHF |
reinforcement learning RLHF |
|
|
| 8 |
Enhancing Pretrained Model-based Continual Representation Learning via Guided Random Projection |
提出SCL-MGSM,通过引导随机投影增强预训练模型在持续表征学习中的性能。 |
representation learning |
|
|
| 9 |
Context Bootstrapped Reinforcement Learning |
提出上下文引导强化学习(CBRL)以提升复杂推理任务的探索效率 |
reinforcement learning |
|
|
| 10 |
Are complicated loss functions necessary for teaching LLMs to reason? |
提出RGRA:一种简化的REINFORCE方法,提升LLM数学推理能力,无需复杂约束。 |
PPO large language model |
|
|
| 11 |
Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement Learning |
提出难度区分策略优化DDPO,解决大模型推理中过度思考和欠思考问题。 |
reinforcement learning |
✅ |
|
| 12 |
iSatCR: Graph-Empowered Joint Onboard Computing and Routing for LEO Data Delivery |
iSatCR:图神经网络赋能的LEO卫星数据联合计算与路由方法 |
reinforcement learning deep reinforcement learning |
|
|
| 13 |
AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models |
AcceRL:用于视觉-语言-动作模型的高效分布式异步强化学习框架 |
reinforcement learning world model vision-language-action |
✅ |
|
| 14 |
Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning |
提出分层强化学习框架,优化资源约束下多集群疫情控制的非药物干预措施 |
reinforcement learning |
|
|
| 15 |
Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement Learning |
提出难度区分策略优化DDPO,解决大模型推理中过度思考和欠思考问题。 |
reinforcement learning |
✅ |
|