| 20 |
PAMF: Prior-Aware Multimodal Fusion for Incomplete Time Series Data |
提出PAMF以解决多模态时间序列数据缺失问题 |
flow matching multimodal |
|
|
| 21 |
MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following |
提出MDP-GRPO以解决多约束指令跟随中的不稳定性问题 |
reinforcement learning instruction following |
|
|
| 22 |
RREDCoT: Segment-Level Reward Redistribution for Reasoning Models |
提出RREDCoT以解决推理模型中的延迟奖励问题 |
reinforcement learning chain-of-thought |
|
|
| 23 |
OPRD: On-Policy Representation Distillation |
提出OPRD以解决现有蒸馏方法的局限性 |
distillation |
✅ |
|
| 24 |
Representation Learning Enables Scalable Multitask Deep Reinforcement Learning |
提出基于表示学习的MR.Q算法以解决多任务深度强化学习的可扩展性问题 |
reinforcement learning deep reinforcement learning world model |
|
|
| 25 |
Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents |
提出自回归扩散世界模型以解决LLM代理的离线评估问题 |
world model world models large language model |
|
|
| 26 |
Spatiotemporal Imputation with Graph-Informed Flow Matching |
提出GiFlow框架以解决时空数据缺失问题 |
flow matching spatiotemporal |
✅ |
|
| 27 |
HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care |
提出HoT-SSM以解决医疗知识图谱中的高阶时间推理问题 |
SSM state space model representation learning |
|
|
| 28 |
Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning |
提出不确定性感知的LLM引导策略塑造以解决稀疏奖励问题 |
reinforcement learning PPO large language model |
|
|
| 29 |
Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation |
提出压缩推理轨迹以提高知识蒸馏效率 |
distillation chain-of-thought |
|
|
| 30 |
Capturing non-Markovian dynamics in non-equilibrium stochastic systems using flow matching |
提出流匹配方法以捕捉非马尔可夫动态 |
flow matching |
|
|
| 31 |
Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory |
提出深度表示学习原理以解决深度学习模型可解释性问题 |
representation learning |
|
|
| 32 |
Maximising the Set-Piece Return: Optimising Football Corner Tactics with Graph Reinforcement Learning |
提出图结构强化学习优化足球角球战术 |
reinforcement learning |
|
|
| 33 |
Discrete Causal Representations from Heterogeneous Domains: A Bayesian Approach with Social Survey Applications |
提出贝叶斯方法以从异构数据中学习离散因果表示 |
representation learning multimodal |
|
|
| 34 |
Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward |
提出改进的多智能体强化学习以解决奖励偏差问题 |
reinforcement learning |
|
|
| 35 |
Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification |
提出KL正则化方法以解决模型误设定下的强化学习问题 |
reinforcement learning |
|
|
| 36 |
Learn to Match: Two-Sided Matching with Temporally Extended Feedback |
提出基于时序扩展反馈的双边匹配框架以解决动态匹配问题 |
reinforcement learning PPO |
|
|