| 1 |
Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution |
提出迭代自进化框架,无需人工标注对齐多模态大语言模型 |
DPO direct preference optimization large language model |
|
|
| 2 |
Offline Reinforcement Learning for LLM Multi-Step Reasoning |
提出OREO:一种用于LLM多步推理的离线强化学习方法 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 3 |
Mamba-based Deep Learning Approach for Sleep Staging on a Wireless Multimodal Wearable System without Electroencephalography |
提出基于Mamba的深度学习方法,利用可穿戴设备多模态数据实现无脑电睡眠分期 |
Mamba multimodal |
|
|
| 4 |
SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification |
SGAC:用于不平衡和结构感知AMP分类的图神经网络框架 |
representation learning contrastive learning distillation |
|
|
| 5 |
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF |
提出FedRLHF:一种保证收敛的联邦RLHF框架,用于保护隐私和实现个性化。 |
reinforcement learning policy learning RLHF |
|
|
| 6 |
SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch |
提出SORREL以解决MILP求解中的分支学习问题 |
reinforcement learning offline reinforcement learning imitation learning |
|
|
| 7 |
Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning |
提出基于新颖性引导的数据重用方法,提升多智能体强化学习效率与多样性 |
reinforcement learning distillation |
|
|
| 8 |
Decoding fairness: a reinforcement learning perspective |
基于强化学习在最后通牒博弈中解码公平性行为 |
reinforcement learning imitation learning |
|
|
| 9 |
Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems |
提出基于多智能体强化学习的卫星序列分配方法,显著提升大规模任务分配效率。 |
reinforcement learning |
|
|
| 10 |
Graph Structure Refinement with Energy-based Contrastive Learning |
提出基于能量的对比学习图结构优化框架ECL-GSR,提升图神经网络节点分类性能。 |
contrastive learning |
|
|