| 1 |
Policy-labeled Preference Learning: Is Preference Enough for RLHF? |
提出政策标签偏好学习以解决RLHF中的偏好不足问题 |
reinforcement learning offline RL preference learning |
|
|
| 2 |
DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning |
提出DYSTIL以解决强化学习中的策略生成问题 |
reinforcement learning large language model |
|
|
| 3 |
Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning |
提出基于决策理论的深度强化学习以提升智能农场网络的韧性与效率 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 4 |
Joint Resource Management for Energy-efficient UAV-assisted SWIPT-MEC: A Deep Reinforcement Learning Approach |
提出无人机辅助SWIPT-MEC系统以解决能源效率与计算资源分配问题 |
reinforcement learning deep reinforcement learning SAC |
|
|
| 5 |
Interpretable Learning Dynamics in Unsupervised Reinforcement Learning |
提出可解释性框架以理解无监督强化学习中的内在动机 |
reinforcement learning PPO representation learning |
|
|
| 6 |
Ergodic Generative Flows |
提出厄尔戈迪克生成流以解决生成流网络训练挑战 |
reinforcement learning imitation learning flow matching |
|
|
| 7 |
A new membership inference attack that spots memorization in generative and predictive models: Loss-Based with Reference Model algorithm (LBRM) |
提出LBRM算法以解决生成模型中的记忆化问题 |
predictive model |
|
|
| 8 |
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance |
提出基于余弦距离的知识蒸馏方法以改善语音去噪性能 |
distillation |
|
|
| 9 |
Absolute Zero: Reinforced Self-play Reasoning with Zero Data |
提出Absolute Zero以解决无数据强化学习中的推理问题 |
reinforcement learning large language model |
|
|
| 10 |
Importance Analysis for Dynamic Control of Balancing Parameter in a Simple Knowledge Distillation Setting |
提出动态调整平衡参数以优化知识蒸馏效果 |
distillation |
|
|
| 11 |
Unraveling the Rainbow: can value-based methods schedule? |
提出基于价值的方法以解决作业调度问题 |
reinforcement learning deep reinforcement learning |
✅ |
|