| 1 |
Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling |
提出SPANE以解决多NUMA环境下动态虚拟机调度问题 |
reinforcement learning deep reinforcement learning |
|
|
| 2 |
In-context Ranking Preference Optimization |
提出IRPO框架,通过上下文排序偏好优化LLM,提升排序任务性能。 |
DPO direct preference optimization large language model |
|
|
| 3 |
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL |
Think2SQL:通过强化LLM推理能力提升Text2SQL性能 |
reinforcement learning large language model |
|
|
| 4 |
Integrating Response Time and Attention Duration in Bayesian Preference Learning for Multiple Criteria Decision Aiding |
提出一种融合反应时间和注意力时长的贝叶斯偏好学习框架,用于多标准决策辅助。 |
preference learning |
|
|
| 5 |
Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment |
提出动态对比技能学习(DCSL),解决强化学习中技能学习的灵活性和泛化性问题。 |
reinforcement learning contrastive learning |
|
|