| 1 |
BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces |
BraVE:用于离散组合动作空间的离线强化学习方法 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 2 |
Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment |
提出双Agent深度强化学习算法,解决动态定价与补货的决策频率不一致问题 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
Unveiling the Role of Expert Guidance: A Comparative Analysis of User-centered Imitation Learning and Traditional Reinforcement Learning |
对比模仿学习与强化学习,揭示专家指导在智能系统中的作用 |
reinforcement learning imitation learning |
|
|
| 4 |
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system |
提出FALCON,利用反馈驱动的自适应长短期记忆强化编码优化系统,提升代码生成质量。 |
reinforcement learning RLHF large language model |
✅ |
|
| 5 |
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time |
提出基于时序自蒸馏的快速扩散语言模型,显著提升生成速度与文本质量。 |
distillation large language model |
|
|
| 6 |
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment |
提出Faster WIND加速LLM对齐,提升迭代Best-of-$N$蒸馏效率 |
distillation large language model |
|
|
| 7 |
Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization |
提出RegQPG算法,通过Lipschitz正则化提升量子强化学习的鲁棒性和泛化性 |
reinforcement learning curriculum learning |
|
|
| 8 |
The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure |
针对状态空间大的强化学习,提出基于潜在低秩结构的迁移强化学习方法 |
reinforcement learning |
|
|
| 9 |
A Multi-Agent Reinforcement Learning Testbed for Cognitive Radio Applications |
扩展RFRL Gym,实现多智能体强化学习在认知无线电应用中的测试与评估。 |
reinforcement learning |
|
|
| 10 |
Flow Matching for Atmospheric Retrieval of Exoplanets: Where Reliability meets Adaptive Noise Levels |
提出基于Flow Matching的行星大气反演方法,提升可靠性与适应性。 |
flow matching |
|
|
| 11 |
Foundations of Safe Online Reinforcement Learning in the Linear Quadratic Regulator: Generalized Baselines |
提出安全在线强化学习框架以解决线性二次调节器问题 |
reinforcement learning |
|
|
| 12 |
Disentangled and Self-Explainable Node Representation Learning |
提出DiSeNE框架,用于生成可解释的解耦节点表示,提升图数据的可理解性。 |
representation learning |
|
|
| 13 |
SepMamba: State-space models for speaker separation using Mamba |
SepMamba:利用Mamba的状态空间模型进行语音分离 |
Mamba |
|
|
| 14 |
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning |
ODRL:提出首个针对异构动力学强化学习的综合性基准测试平台 |
reinforcement learning |
✅ |
|
| 15 |
Identifying Selections for Unsupervised Subtask Discovery |
提出基于选择机制的无监督子任务发现方法,提升多任务模仿学习泛化能力。 |
reinforcement learning imitation learning |
|
|
| 16 |
Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient |
提出基于策略梯度的RL-V2V-GAN,用于少样本视频到视频的生成。 |
reinforcement learning deep reinforcement learning |
|
|
| 17 |
Getting By Goal Misgeneralization With a Little Help From a Mentor |
提出一种基于导师辅助的强化学习方法,缓解目标泛化性缺失问题 |
reinforcement learning PPO |
|
|