| 1 |
The Perfect Blend: Redefining RLHF with Mixture of Judges |
提出基于混合评判器的约束生成策略优化(CGPO),提升RLHF在多任务学习中的性能。 |
reinforcement learning PPO RLHF |
|
|
| 2 |
RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models |
RouterDC:通过双重对比学习的查询式路由,用于组装大型语言模型 |
contrastive learning large language model |
✅ |
|
| 3 |
Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models |
提出FibecFed框架,利用Fisher信息高效地对大语言模型进行联邦学习微调。 |
curriculum learning large language model |
|
|
| 4 |
A SSM is Polymerized from Multivariate Time Series |
提出Poly-Mamba以解决多变量时间序列建模中的复杂依赖问题 |
Mamba SSM state space model |
✅ |
|
| 5 |
Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning |
提出We-DRIVE-U算法以解决离线强化学习中的动态不确定性问题 |
reinforcement learning |
|
|
| 6 |
Collaborative Knowledge Distillation via a Learning-by-Education Node Community |
提出LENC框架以解决协作知识蒸馏问题 |
distillation |
|
|
| 7 |
Whole-Graph Representation Learning For the Classification of Signed Networks |
针对符号网络分类,提出两种全局图表示学习方法SG2V和WSGCN。 |
representation learning |
|
|
| 8 |
HYDRA-FL: Hybrid Knowledge Distillation for Robust and Accurate Federated Learning |
提出HYDRA-FL,通过混合知识蒸馏提升联邦学习在异构数据和攻击下的鲁棒性和准确性 |
distillation |
|
|
| 9 |
TSI: A Multi-View Representation Learning Approach for Time Series Forecasting |
TSI:一种用于时间序列预测的多视角表征学习方法 |
representation learning |
|
|