| 1 |
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback |
提出上下文在线不确定性感知偏好学习框架,用于从人类反馈中优化模型。 |
reinforcement learning preference learning RLHF |
|
|
| 2 |
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors |
提出Preference Vector框架以解决LLM的有用性与无害性平衡问题 |
reinforcement learning RLHF DPO |
|
|
| 3 |
Supervised Pretraining for Material Property Prediction |
提出监督预训练方法,利用材料类别信息提升材料属性预测性能。 |
representation learning MAE foundation model |
|
|
| 4 |
HyperController: A Hyperparameter Controller for Fast and Stable Training of Reinforcement Learning Neural Networks |
提出HyperController,加速强化学习神经网络的超参数优化。 |
reinforcement learning |
|
|
| 5 |
Attention to Detail: Fine-Scale Feature Preservation-Oriented Geometric Pre-training for AI-Driven Surrogate Modeling |
提出一种面向细节保留的几何预训练方法,用于AI驱动的代理建模。 |
representation learning foundation model |
|
|
| 6 |
Swapped Logit Distillation via Bi-level Teacher Alignment |
提出基于双层教师对齐的交换Logit蒸馏方法,提升知识迁移性能。 |
distillation |
|
|