| 1 |
UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function |
UNA:通过广义隐式奖励函数统一RLHF/PPO、DPO和KTO对齐方法 |
PPO RLHF DPO |
|
|
| 2 |
Optimization Solution Functions as Deterministic Policies for Offline Reinforcement Learning |
提出iAC框架,利用优化解函数作为离线强化学习的确定性策略,提升鲁棒性。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 3 |
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis |
提出sLAVA:一种参数高效的量化混合专家视觉语言模型,用于半导体电子显微图像分析 |
teacher-student multimodal instruction following |
|
|
| 4 |
Generative Verifiers: Reward Modeling as Next-Token Prediction |
提出生成式验证器(GenRM),利用下一token预测目标提升LLM推理性能。 |
DPO large language model chain-of-thought |
|
|
| 5 |
Simultaneous Training of First- and Second-Order Optimizers in Population-Based Reinforcement Learning |
提出在基于种群的强化学习中同时训练一阶和二阶优化器的方法 |
reinforcement learning TD3 |
|
|
| 6 |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models |
提出Transformer蒸馏至线性RNN的混合模型,并用硬件感知推测解码加速推理。 |
Mamba distillation |
✅ |
|
| 7 |
Unsupervised-to-Online Reinforcement Learning |
提出无监督到在线强化学习(U2O RL),解决离线到在线强化学习的局限性。 |
reinforcement learning offline RL |
|
|
| 8 |
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning |
Instruct-SkillMix:一种强大的LLM指令调优自动化流程,低成本生成高质量SFT数据。 |
PPO DPO instruction following |
|
|
| 9 |
Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation |
提出DP-SAD,通过随机对抗蒸馏学习差分隐私扩散模型,提升生成质量。 |
distillation |
|
|
| 10 |
What makes math problems hard for reinforcement learning: a case study |
通过组合群论猜想,研究强化学习在寻找稀有高回报实例中的挑战 |
reinforcement learning |
|
|
| 11 |
On latent dynamics learning in nonlinear reduced order modeling |
提出潜变量动力学模型(LDM)用于非线性降阶建模,提升参数化偏微分方程的求解精度。 |
latent dynamics |
|
|
| 12 |
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning |
提出近似对称性方法以解决多智能体强化学习问题 |
reinforcement learning |
|
|
| 13 |
Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems |
提出基于强化学习的动态算子管理框架,用于置换流水车间调度问题。 |
reinforcement learning |
|
|
| 14 |
Learning Granularity Representation for Temporal Knowledge Graph Completion |
提出LGRe模型,利用多粒度时间表示增强时序知识图谱补全 |
representation learning TAMP |
✅ |
|
| 15 |
DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing |
提出基于DRL的联邦自监督学习任务卸载与资源分配算法,优化ISAC车辆边缘计算。 |
DRL |
|
|
| 16 |
Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction |
提出可解释的分层城市表征学习模型,用于通勤流量预测 |
representation learning |
|
|