| 1 |
(DEMO) Deep Reinforcement Learning Based Resource Allocation in Distributed IoT Systems |
提出基于深度强化学习的资源分配框架以解决分布式物联网系统问题 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 2 |
DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift |
提出DRMD以解决恶意软件检测中的概念漂移问题 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
HAEPO: History-Aggregated Exploratory Policy Optimization |
提出HAEPO以解决长时间任务探索不足的问题 |
reinforcement learning PPO DPO |
|
|
| 4 |
History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL |
提出RhymeRL以解决大语言模型强化学习中的GPU利用率低下问题 |
reinforcement learning large language model |
|
|
| 5 |
Re:Frame -- Retrieving Experience From Associative Memory |
提出Re:Frame以解决离线强化学习中的专家数据稀缺问题 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 6 |
Beyond Tokens: Enhancing RTL Quality Estimation via Structural Graph Learning |
提出StructRTL框架以提升RTL设计质量估计 |
representation learning distillation large language model |
|
|
| 7 |
Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for UAV-Based Wildlife Protection |
提出基于期望最大化的潜变量建模以解决无人机野生动物保护问题 |
reinforcement learning PPO |
|
|
| 8 |
Stability and Generalization for Bellman Residuals |
提出Bellman残差最小化以解决离线强化学习中的一致性问题 |
reinforcement learning offline reinforcement learning inverse reinforcement learning |
|
|
| 9 |
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks |
提出混合专家模型的最优稀疏性以提升推理任务性能 |
reinforcement learning large language model |
✅ |
|
| 10 |
Atrial Fibrillation Prediction Using a Lightweight Temporal Convolutional and Selective State Space Architecture |
提出轻量级深度学习模型以实现心房颤动的早期预测 |
Mamba state space model |
|
|
| 11 |
Revisiting associative recall in modern recurrent models |
探讨现代递归模型中的联想回忆问题及其优化策略 |
Mamba SSM |
|
|
| 12 |
Dual-Distilled Heterogeneous Federated Learning with Adaptive Margins for Trainable Global Prototypes |
提出双蒸馏异构联邦学习以解决原型边界收缩问题 |
contrastive learning distillation |
|
|