| 1 |
Efficient Preference Poisoning Attack on Offline RLHF |
提出高效偏好投毒攻击方法,针对离线RLHF中的DPO算法 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 2 |
Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning |
提出动态大小推理块以解决固定块生成的局限性 |
reinforcement learning large language model |
✅ |
|
| 3 |
Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models |
提出Gradient-Gated DPO,稳定语言模型偏好优化过程,缓解概率坍塌问题 |
reinforcement learning DPO direct preference optimization |
|
|
| 4 |
Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability |
提出基于循环深度强化学习的化疗控制方法,提升部分可观测性下的治疗效果 |
reinforcement learning deep reinforcement learning TD3 |
|
|
| 5 |
Combining Trained Models in Reinforcement Learning |
对深度强化学习中预训练模型复用方法进行系统性综述,分析其有效性和局限性。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 6 |
Federated Reinforcement Learning for Efficient Mobile Crowdsensing under Incomplete Information |
提出FDRL-PPO算法,解决移动群智感知中信息不完备下的高效任务参与问题。 |
reinforcement learning deep reinforcement learning PPO |
|
|
| 7 |
Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation |
提出基于历史信息的强化学习与潜变量模型自适应的CO2地质封存闭环控制方法 |
reinforcement learning latent dynamics teacher-student |
|
|
| 8 |
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management |
提出基于元强化学习的财富管理方法,快速解决个性化投资组合优化问题 |
reinforcement learning foundation model |
|
|
| 9 |
Statistical Consistency and Generalization of Contrastive Representation Learning |
提出统一统计学习理论以解决对比表示学习的统计一致性问题 |
representation learning foundation model |
|
|
| 10 |
Evaluating Tabular Representation Learning for Network Intrusion Detection |
评估表格数据表示学习在网络入侵检测中的应用 |
representation learning |
|
|
| 11 |
Middle-mile logistics through the lens of goal-conditioned reinforcement learning |
提出基于目标条件强化学习的中间一英里物流优化方法 |
reinforcement learning |
|
|
| 12 |
Binary Rewards and Reinforcement Learning: Fundamental Challenges |
提出KL控制以解决二元奖励下的多样性崩溃问题 |
reinforcement learning |
|
|
| 13 |
A decoupled diffusion planner that adapts to changing cost limits by using cost-conditioned generation for safety and reward gradients for performance |
提出SDGD,通过解耦扩散规划适应变化的安全约束,提升离线安全强化学习性能。 |
reinforcement learning classifier-free guidance |
|
|