| 1 |
ZeroFlood: A Geospatial Foundation Model for Data-Efficient Flood Susceptibility Mapping |
ZeroFlood:一种用于数据高效洪水易感性制图的地理空间基础模型 |
representation learning foundation model |
|
|
| 2 |
Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI |
提出基于PPO的交错编码器,用于BabyAI中的语言引导强化学习 |
reinforcement learning deep reinforcement learning PPO |
|
|
| 3 |
Debiasing Reward Models by Representation Learning with Guarantees |
提出一种基于表征学习的解偏方法,用于提升奖励模型的鲁棒性。 |
reinforcement learning representation learning large language model |
|
|
| 4 |
Lightweight Robust Direct Preference Optimization |
提出DPO-PRO,通过轻量级分布鲁棒优化提升DPO在噪声环境下的性能 |
DPO direct preference optimization large language model |
|
|
| 5 |
On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning |
DMARL-RSA揭示了去中心化可学习奖励塑造在合作多智能体强化学习中的局限性 |
reinforcement learning reward shaping |
|
|
| 6 |
GIFT: Group-relative Implicit Fine Tuning Integrates GRPO with DPO and UNA |
提出GIFT框架,结合GRPO、DPO和UNA优势,高效对齐LLM。 |
reinforcement learning PPO DPO |
|
|
| 7 |
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation |
提出基于max@k优化的强化学习方法,提升LLM在Best-of-N采样中的性能。 |
reinforcement learning large language model |
|
|
| 8 |
Offline Preference Optimization via Maximum Marginal Likelihood Estimation |
提出基于最大边缘似然估计的离线偏好优化方法MMPO,简化LLM对齐流程。 |
reinforcement learning RLHF large language model |
|
|
| 9 |
Learning to Reason Efficiently with Discounted Reinforcement Learning |
提出基于折扣强化学习的高效推理方法,缩短推理链并保持准确性 |
reinforcement learning |
|
|
| 10 |
Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts |
提出基于路由感知的重采样方法,稳定MoE模型的强化学习训练。 |
reinforcement learning |
|
|
| 11 |
Sentinel: Dynamic Knowledge Distillation for Personalized Federated Intrusion Detection in Heterogeneous IoT Networks |
Sentinel:异构IoT网络中基于动态知识蒸馏的个性化联邦入侵检测 |
distillation |
|
|
| 12 |
Coupled Flow Matching |
提出耦合流匹配(CPFM),实现可控降维和高保真重建。 |
flow matching |
|
|