| 1 |
RLHF Workflow: From Reward Modeling to Online RLHF |
提出在线迭代RLHF流程,提升大型语言模型在聊天机器人基准测试中的性能。 |
reinforcement learning RLHF large language model |
✅ |
|
| 2 |
Decision Mamba Architectures |
提出Decision Mamba和Hierarchical Decision Mamba,提升模仿学习中Transformer模型的性能。 |
imitation learning decision transformer Mamba |
✅ |
|
| 3 |
Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks |
提出基于Kolmogorov-Arnold网络的柔性EHD泵预测模型,提升精度与可解释性。 |
predictive model |
|
|
| 4 |
Radio Resource Management and Path Planning in Intelligent Transportation Systems via Reinforcement Learning for Environmental Sustainability |
提出基于强化学习的无线资源管理与路径规划方法,提升智能交通系统环境可持续性 |
reinforcement learning |
|
|
| 5 |
Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial Optimization |
提出基于哈密顿量的量子强化学习方法,用于神经组合优化 |
reinforcement learning |
|
|
| 6 |
Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation |
提出基于混合变分量子 ansatz 的量子强化学习方法,用于优化数据库连接顺序。 |
reinforcement learning |
|
|
| 7 |
Neural Network Compression for Reinforcement Learning Tasks |
针对强化学习任务,探索神经网络压缩以提升推理效率 |
reinforcement learning |
|
|
| 8 |
GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation |
提出GLiRA:一种基于知识蒸馏的黑盒成员推理攻击方法 |
distillation |
|
|
| 9 |
POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning |
POWQMIX:通过潜在最优联合动作识别加权分解值函数,提升合作多智能体强化学习性能 |
reinforcement learning |
|
|