| 1 |
ACE-RLHF: Automated Code Evaluation and Socratic Feedback Generation Tool using Large Language Models and Reinforcement Learning with Human Feedback |
提出ACE-RLHF:利用LLM和RLHF自动生成代码评估与苏格拉底式反馈工具 |
reinforcement learning RLHF large language model |
|
|
| 2 |
A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization |
提出Pairwise-RL,通过统一的成对框架优化RLHF,提升奖励模型校准与策略优化。 |
reinforcement learning PPO RLHF |
|
|
| 3 |
Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation |
提出基于注意力机制和图卷积的逆强化学习方法,用于多智能体任务分配 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 4 |
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning |
提出AdaRFT,通过自适应课程学习提升强化微调在数学推理中的效率和准确性 |
PPO curriculum learning IMoS |
|
|
| 5 |
A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks |
提出基于后决策状态和双重Critic网络的PDPPO算法,提升随机环境下强化学习性能 |
reinforcement learning deep reinforcement learning PPO |
|
|
| 6 |
Bidirectional Hierarchical Protein Multi-Modal Representation Learning |
提出双向分层蛋白质多模态表征学习框架,融合序列与结构信息。 |
representation learning multimodal |
|
|
| 7 |
Gaussian Mixture Flow Matching Models |
提出高斯混合流匹配模型(GMFlow),提升少步采样质量并缓解图像生成中的色彩过饱和问题。 |
flow matching classifier-free guidance |
|
|
| 8 |
Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning |
提出基于多智能体强化学习的大规模混合交通路口控制方法 |
reinforcement learning penetration |
|
|
| 9 |
The Role of Environment Access in Agnostic Reinforcement Learning |
提出无环境假设强化学习方法以解决样本效率问题 |
reinforcement learning policy learning |
|
|
| 10 |
Playing Non-Embedded Card-Based Games with Reinforcement Learning |
提出非嵌入式强化学习策略,解决视觉输入下皇室战争实时对战问题 |
reinforcement learning offline reinforcement learning |
✅ |
|
| 11 |
RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy |
提出RLBayes算法,利用强化学习搜索策略解决贝叶斯网络结构学习的NP难问题。 |
reinforcement learning |
|
|