| 15 |
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model |
提出上下文相似性蒸馏,用单模型高效估计深度集成的不确定性,提升强化学习探索效率。 |
reinforcement learning offline reinforcement learning distillation |
|
|
| 16 |
Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control |
Unicorn:一种通用协作强化学习方法,用于可泛化的全网络交通信号控制 |
reinforcement learning contrastive learning |
|
|
| 17 |
SPECTra: Scalable Multi-Agent Reinforcement Learning with Permutation-Free Networks |
SPECTra:基于无排列网络的可扩展多智能体强化学习 |
reinforcement learning curriculum learning |
✅ |
|
| 18 |
Crash Severity Analysis of Child Bicyclists using Arm-Net and MambaNet |
利用ARM-Net和MambaNet分析儿童自行车事故严重程度,MambaNet表现更优。 |
predictive model Mamba |
|
|
| 19 |
A Review of DeepSeek Models' Key Innovative Techniques |
DeepSeek模型创新技术综述:低成本实现媲美顶尖闭源LLM的性能 |
reinforcement learning large language model |
|
|
| 20 |
Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification |
声学场景分类中,研究教师模型属性对知识蒸馏学生模型性能的影响 |
distillation |
|
|
| 21 |
OPTIMUS: Predicting Multivariate Outcomes in Alzheimer's Disease Using Multi-modal Data amidst Missing Values |
OPTIMUS:利用多模态数据和可解释AI预测阿尔茨海默病中的多变量结果 |
predictive model multimodal |
|
|
| 22 |
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models |
综述性论文:深入剖析状态空间模型(SSM)的有效性和效率技术 |
Mamba SSM state space model |
|
|
| 23 |
Enabling Weak Client Participation via On-device Knowledge Distillation in Heterogeneous Federated Learning |
提出基于设备端知识蒸馏的异构联邦学习方法,解决弱客户端参与问题。 |
distillation |
|
|
| 24 |
Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium |
揭示LLM对齐人类偏好的统计极限:从孔多塞悖论到纳什均衡 |
reinforcement learning large language model |
|
|
| 25 |
Residual Policy Gradient: A Reward View of KL-regularized Objective |
提出残差策略梯度(RPG),扩展残差Q学习到策略梯度方法,用于策略定制。 |
reinforcement learning imitation learning |
|
|