| 8 |
Kimi k1.5: Scaling Reinforcement Learning with LLMs |
Kimi k1.5:通过强化学习与长文本建模提升多模态大语言模型推理能力 |
reinforcement learning large language model |
|
|
| 9 |
UAV-assisted Internet of Vehicles: A Framework Empowered by Reinforcement Learning and Blockchain |
提出基于强化学习和区块链的无人机辅助车联网框架,实现可信赖的中继选择与协同。 |
reinforcement learning deep reinforcement learning PPO |
|
|
| 10 |
Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained Scheduling |
提出SOCD算法,解决多用户延迟约束调度中的离线强化学习问题 |
reinforcement learning offline reinforcement learning diffusion policy |
|
|
| 11 |
Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization |
提出基于强化学习的差分进化算法自动设计框架,用于黑盒优化。 |
reinforcement learning |
|
|
| 12 |
Deep Learning-Based Identification of Inconsistent Method Names: How Far Are We? |
评估深度学习方法在识别不一致方法名上的局限性,并提出改进方向。 |
contrastive learning large language model |
|
|
| 13 |
HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation |
HEPPO-GAE:用于近端策略优化中广义优势估计的硬件高效加速器 |
reinforcement learning PPO |
|
|