| 1 |
Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models |
Pram:利用多模态语言模型解决多商品流问题,实现优化与效率的平衡 |
reinforcement learning multimodal |
|
|
| 2 |
Contrastive Learning for Multi Label ECG Classification with Jaccard Score Based Sigmoid Loss |
提出基于Jaccard系数Sigmoid损失的对比学习方法,用于多标签心电图分类 |
contrastive learning large language model multimodal |
|
|
| 3 |
A Multimodal Conditional Mixture Model with Distribution-Level Physics Priors |
提出基于混合密度网络的物理信息多模态条件混合模型,解决科学计算中多模态分布学习问题。 |
flow matching multimodal |
|
|
| 4 |
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization |
RePO:通过重述策略优化桥接在线学习与离线知识,提升LLM领域知识对齐效果。 |
reinforcement learning policy learning large language model |
|
|
| 5 |
Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling |
提出BNRM,通过贝叶斯非负奖励建模缓解RLHF中的奖励攻击问题 |
reinforcement learning RLHF large language model |
|
|
| 6 |
Enhancing Ride-Hailing Forecasting at DiDi with Multi-View Geospatial Representation Learning from the Web |
提出MVGR-Net,利用多视角地理空间表征学习提升网约车需求预测精度 |
representation learning large language model |
|
|
| 7 |
Semi-Supervised Cross-Domain Imitation Learning |
提出半监督跨域模仿学习算法,解决专家数据稀缺问题 |
policy learning imitation learning |
✅ |
|
| 8 |
Driving Reaction Trajectories via Latent Flow Matching |
提出LatentRxnFlow,通过潜在流匹配建模化学反应轨迹,提升反应预测的透明性和可诊断性。 |
flow matching latent dynamics |
|
|
| 9 |
Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable Rewards |
提出非对称Prompt权重RL方法,加速可验证奖励下的策略学习 |
reinforcement learning |
|
|
| 10 |
General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies |
提出通用灵活的f散度,提升离线强化学习在低随机性和多样化策略数据集上的性能。 |
offline RL |
|
|
| 11 |
Resource-Efficient Model-Free Reinforcement Learning for Board Games |
提出一种资源高效的免模型强化学习算法,用于解决棋盘游戏中的决策问题。 |
reinforcement learning |
|
|
| 12 |
SimuScene: Training and Benchmarking Code Generation to Simulate Physical Scenarios |
SimuScene:训练和评估LLM生成代码以模拟物理场景 |
reinforcement learning large language model |
|
|
| 13 |
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training |
VESPO:变分序列级软策略优化,用于稳定的大语言模型离线训练 |
reinforcement learning large language model |
✅ |
|
| 14 |
LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization |
提出PiT-PO框架,通过强化学习自适应调整LLM,用于发现科学方程。 |
reinforcement learning large language model |
|
|
| 15 |
Control Reinforcement Learning: Token-Level Mechanistic Analysis via Learned SAE Feature Steering |
提出Control Reinforcement Learning,通过学习SAE特征引导实现token级别机制分析。 |
reinforcement learning |
|
|
| 16 |
Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning |
提出二元流匹配,通过预测损失空间对齐实现二元数据生成模型的鲁棒学习。 |
flow matching |
|
|
| 17 |
OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories |
提出OSIL算法,利用非优轨迹学习离线安全模仿策略 |
policy learning imitation learning |
|
|