| 1 |
Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment |
提出理论框架以解决多模态对比学习中的模态间隙问题 |
contrastive learning multimodal |
|
|
| 2 |
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts |
提出SPEC-RL以加速基于策略的强化学习中的回滚过程 |
reinforcement learning PPO large language model |
✅ |
|
| 3 |
General Exploratory Bonus for Optimistic Exploration in RLHF |
提出通用探索奖励(GEB),解决RLHF中乐观探索的偏差问题。 |
reinforcement learning RLHF large language model |
|
|
| 4 |
Causally-Enhanced Reinforcement Policy Optimization |
提出因果增强策略优化以解决强化学习中的奖励黑客问题 |
PPO reward shaping large language model |
|
|
| 5 |
Knowledge distillation through geometry-aware representational alignment |
提出基于几何感知的表征对齐知识蒸馏方法,提升语言模型性能。 |
distillation instruction following |
|
|
| 6 |
CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning |
提出 CrystalGym:一个用于强化学习材料发现的新基准测试环境 |
reinforcement learning large language model |
|
|
| 7 |
Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving |
提出POC框架以优化MILP重解决策 |
reinforcement learning policy learning |
|
|
| 8 |
Unleashing Flow Policies with Distributional Critics |
提出分布流Critic,增强离线强化学习中Flow Policy的性能 |
reinforcement learning flow matching multimodal |
|
|
| 9 |
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models |
提出因子解耦增强的数据移除方法,提升深度预测模型在分布偏移下的性能。 |
predictive model |
|
|
| 10 |
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport |
LOTFormer:通过低秩最优传输实现双重随机线性注意力机制 |
linear attention |
|
|
| 11 |
Flow Matching for Robust Simulation-Based Inference under Model Misspecification |
提出FMCPE框架,利用Flow Matching提升SBI在模型失配下的鲁棒性 |
flow matching |
|
|
| 12 |
LLM Interpretability with Identifiable Temporal-Instantaneous Representation |
提出可识别时序瞬时表示的LLM可解释性框架,提升概念关系发现能力 |
representation learning large language model |
|
|
| 13 |
Two-Scale Latent Dynamics for Recurrent-Depth Transformers |
针对循环深度Transformer,提出基于二阶步长差的早退机制,提升效率与稳定性。 |
latent dynamics |
|
|
| 14 |
Towards Monotonic Improvement in In-Context Reinforcement Learning |
提出Context Value Informed ICRL,解决ICRL中因上下文歧义导致的性能退化问题 |
reinforcement learning |
✅ |
|
| 15 |
Learning without Global Backpropagation via Synergistic Information Distillation |
提出协同信息蒸馏(SID)框架,解决深度学习反向传播的扩展性瓶颈。 |
distillation |
✅ |
|
| 16 |
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning |
提出C$^2$GSPG以解决自我意识推理中的过度自信问题 |
reinforcement learning large language model |
✅ |
|
| 17 |
Impute-MACFM: Imputation based on Mask-Aware Flow Matching |
提出Impute-MACFM,基于Mask-Aware Flow Matching实现更鲁棒高效的表格数据插补,尤其适用于纵向数据。 |
flow matching |
|
|
| 18 |
Tracing the Representation Geometry of Language Models from Pretraining to Post-training |
提出光谱方法以探究语言模型的表示几何特性 |
DPO large language model |
|
|
| 19 |
From Noise to Laws: Regularized Time-Series Forecasting via Denoised Dynamic Graphs |
PRISM:通过去噪动态图正则化时间序列预测,实现长期稳定预测 |
MAE physically plausible |
|
|
| 20 |
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm |
提出信赖域奖励优化(TRRO)框架,解决逆强化学习中奖励函数学习的稳定性问题。 |
reinforcement learning inverse reinforcement learning |
|
|