| 1 |
Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments |
提出模型增强的对抗逆强化学习框架,提升随机环境下的样本效率 |
reinforcement learning inverse reinforcement learning reward shaping |
|
|
| 2 |
Exploring the Limitations of Mamba in COPY and CoT Reasoning |
分析Mamba在COPY操作和CoT推理中的局限性,揭示其在特定任务上的性能瓶颈。 |
Mamba linear attention large language model |
|
|
| 3 |
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents |
提出MART:通过交互式学习微调MLLM作为检索器,提升具身智能体多模态检索性能 |
preference learning multimodal |
✅ |
|
| 4 |
Predictive Coding for Decision Transformer |
提出基于预测编码的决策Transformer(PCDT),提升离线目标条件RL任务性能 |
reinforcement learning policy learning offline reinforcement learning |
|
|
| 5 |
Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization |
提出基于向量量化的输入转换方法,提升深度强化学习对抗扰动的鲁棒性 |
reinforcement learning deep reinforcement learning |
|
|
| 6 |
Open-World Reinforcement Learning over Long Short-Term Imagination |
提出LS-Imagine,通过长短期想象力提升开放世界强化学习的探索效率。 |
reinforcement learning world model affordance |
|
|
| 7 |
SELU: Self-Learning Embodied MLLMs in Unknown Environments |
提出SELU,通过自学习提升具身多模态大语言模型在未知环境中的理解与决策能力 |
reinforcement learning large language model multimodal |
|
|
| 8 |
Demystifying the Token Dynamics of Deep Selective State Space Models |
揭示深度选择性状态空间模型Token动态特性,并提出改进Mamba模型性能的新方法 |
Mamba SSM state space model |
|
|
| 9 |
Spatial-Aware Decision-Making with Ring Attractors in Reinforcement Learning Systems |
利用环形吸引子进行空间感知决策,提升强化学习系统性能 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 10 |
Mathematical Formalism for Memory Compression in Selective State Space Models |
提出选择性门控机制,用于选择性状态空间模型中的记忆压缩,提升长序列建模效率。 |
SSM state space model |
|
|
| 11 |
Learning Code Preference via Synthetic Evolution |
提出CodeFavor框架,通过合成进化数据学习代码偏好,提升代码生成质量。 |
preference learning large language model |
|
|
| 12 |
Robust Offline Imitation Learning from Diverse Auxiliary Data |
提出ROIDA,解决离线模仿学习中利用多样辅助数据时的鲁棒性问题 |
imitation learning |
✅ |
|
| 13 |
Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting |
针对时序预测,提出新型概率路径Flow Matching模型,提升预测性能。 |
flow matching |
|
|
| 14 |
Improving Node Representation by Boosting Target-Aware Contrastive Loss |
提出Target-aware CL,通过目标感知对比学习提升节点表征质量 |
representation learning contrastive learning |
|
|