| 1 |
SAIL: Self-Improving Efficient Online Alignment of Large Language Models |
SAIL:通过自迭代在线对齐提升大型语言模型性能 |
reinforcement learning RLHF DPO |
|
|
| 2 |
Robust Reinforcement Learning from Corrupted Human Feedback |
提出R³M方法,通过建模稀疏异常值,提升RLHF在含噪声人类反馈下的鲁棒性。 |
reinforcement learning RLHF DPO |
|
|
| 3 |
KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty |
KalMamba:面向不确定性强化学习的高效概率状态空间模型 |
reinforcement learning Mamba SSM |
|
|
| 4 |
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning |
MU-Bench:一个用于机器遗忘的多任务多模态综合基准测试平台 |
curriculum learning multimodal |
|
|
| 5 |
Towards Dynamic Resource Allocation and Client Scheduling in Hierarchical Federated Learning: A Two-Phase Deep Reinforcement Learning Approach |
提出一种双阶段深度强化学习框架,用于能量收集驱动的分层联邦学习中动态资源分配和客户端调度。 |
reinforcement learning deep reinforcement learning |
|
|
| 6 |
Pareto-Optimal Learning from Preferences with Hidden Context |
提出POPL算法,解决多人群偏好下的强化学习对齐问题,实现帕累托最优 |
reinforcement learning preference learning RLHF |
|
|
| 7 |
Investigating the Transferability of Code Repair for Low-Resource Programming Languages |
研究代码修复能力在低资源编程语言上的迁移性,揭示推理能力与代码修复能力的弱相关性。 |
distillation large language model chain-of-thought |
|
|
| 8 |
Behaviour Distillation |
提出行为蒸馏方法HaDES,仅用少量合成数据训练强化学习策略 |
reinforcement learning distillation |
|
|
| 9 |
Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning |
针对基于核函数的强化学习,探索最优遗憾界限的开放性问题 |
reinforcement learning |
|
|
| 10 |
Towards General Negotiation Strategies with End-to-End Reinforcement Learning |
提出基于图神经网络的端到端强化学习方法,解决通用协商策略问题 |
reinforcement learning |
|
|
| 11 |
From Overfitting to Robustness: Quantity, Quality, and Variety Oriented Negative Sample Selection in Graph Contrastive Learning |
提出NegAmplify框架,通过累积样本选择解决图对比学习中的过拟合问题 |
contrastive learning |
|
|
| 12 |
SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning |
提出对称不变Transformer(SiT),提升强化学习在MiniGrid和Procgen等环境中的泛化能力。 |
reinforcement learning |
|
|
| 13 |
An Idiosyncrasy of Time-discretization in Reinforcement Learning |
针对强化学习中时间离散化问题,提出一种改进方法以对齐连续时间与离散时间回报定义。 |
reinforcement learning |
|
|
| 14 |
DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning |
提出DN-CL,通过对比学习增强深度符号回归在噪声环境下的性能 |
contrastive learning |
|
|