| 1 |
Policy-labeled Preference Learning: Is Preference Enough for RLHF? |
提出策略标记偏好学习(PPL),通过后悔建模解决RLHF中的似然不匹配问题 |
reinforcement learning offline RL preference learning |
|
|
| 2 |
DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning |
提出DYSTIL,利用大语言模型动态诱导策略,提升强化学习泛化性和效率。 |
reinforcement learning large language model |
|
|
| 3 |
Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning |
提出决策理论指导的深度强化学习方法,提升智能农场网络在对抗攻击和能源约束下的韧性和效率。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 4 |
Joint Resource Management for Energy-efficient UAV-assisted SWIPT-MEC: A Deep Reinforcement Learning Approach |
提出基于深度强化学习的无人机辅助SWIPT-MEC联合资源管理方案,提升能量效率和终端续航。 |
reinforcement learning deep reinforcement learning SAC |
|
|
| 5 |
Interpretable Learning Dynamics in Unsupervised Reinforcement Learning |
提出URL智能体可解释性框架,分析内驱动机对智能体行为和表征学习的影响 |
reinforcement learning PPO representation learning |
|
|
| 6 |
Ergodic Generative Flows |
提出遍历生成流(EGFs)以解决生成流网络在连续环境和模仿学习中的训练难题。 |
reinforcement learning imitation learning flow matching |
|
|
| 7 |
A new membership inference attack that spots memorization in generative and predictive models: Loss-Based with Reference Model algorithm (LBRM) |
提出LBRM算法,通过参考模型提升生成模型记忆化训练数据的检测精度。 |
predictive model |
|
|
| 8 |
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance |
提出基于余弦距离潜在表征对齐的知识蒸馏语音降噪方法 |
distillation |
|
|
| 9 |
Absolute Zero: Reinforced Self-play Reasoning with Zero Data |
提出Absolute Zero:一种无需外部数据的自博弈强化学习推理方法 |
reinforcement learning large language model |
|
|
| 10 |
Importance Analysis for Dynamic Control of Balancing Parameter in a Simple Knowledge Distillation Setting |
提出动态调整知识蒸馏平衡参数方法,提升学生网络训练效率 |
distillation |
|
|
| 11 |
Unraveling the Rainbow: can value-based methods schedule? |
探索价值方法在调度问题中的潜力:价值学习算法在Job-Shop调度问题中表现优异 |
reinforcement learning deep reinforcement learning |
✅ |
|