| 1 |
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient |
Drama:基于Mamba的状态空间模型提升模型强化学习的样本效率和参数效率 |
reinforcement learning world model model-based RL |
✅ |
|
| 2 |
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both |
提出DRDO,同时进行奖励蒸馏和偏好学习,提升语言模型性能。 |
preference learning RLHF DPO |
|
|
| 3 |
When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning |
提出MAGB基准数据集,系统评估多模态属性图学习的GNN和VLM方法。 |
representation learning multimodal |
✅ |
|
| 4 |
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization |
提出DQO:通过直接Q函数优化提升语言模型的多步推理能力 |
reinforcement learning PPO SAC |
|
|
| 5 |
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning |
提出基于判别概率建模的自监督表征学习方法,提升对比学习性能 |
representation learning multimodal |
✅ |
|
| 6 |
Parameter-Efficient Fine-Tuning of State Space Models |
提出稀疏维度调整(SDT)方法,高效微调状态空间模型(SSM),提升性能。 |
Mamba SSM state space model |
|
|
| 7 |
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization |
揭示DPO中似然位移现象,提出CHES指标以缓解非预期对齐问题 |
DPO direct preference optimization |
|
|
| 8 |
M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation |
M$^3$-Impute:利用掩码引导的表征学习进行缺失值插补 |
representation learning MAE |
|
|
| 9 |
Zero-Shot Offline Imitation Learning via Optimal Transport |
提出基于最优传输的零样本离线模仿学习方法,解决传统方法短视问题。 |
imitation learning world model |
✅ |
|
| 10 |
DFM: Interpolant-free Dual Flow Matching |
提出无插值的对偶流匹配(DFM)方法,提升无监督异常检测性能。 |
flow matching |
|
|
| 11 |
AI Learning Algorithms: Deep Learning, Hybrid Models, and Large-Scale Model Integration |
综述AI学习算法:深度学习、混合模型与大规模模型集成 |
reinforcement learning large language model |
|
|
| 12 |
Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control |
提出序列强化学习(SRL),解决连续控制中低决策频率下的控制难题。 |
reinforcement learning |
|
|
| 13 |
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL |
MAD-TD:模型增强数据稳定高更新率强化学习,提升样本效率 |
reinforcement learning deep reinforcement learning world model |
|
|
| 14 |
Distillation of Discrete Diffusion through Dimensional Correlations |
提出混合模型以解决离散扩散模型采样速度慢的问题 |
distillation |
✅ |
|
| 15 |
DistDD: Distributed Data Distillation Aggregation through Gradient Matching |
DistDD:通过梯度匹配实现分布式数据蒸馏聚合,减少联邦学习中的重复通信。 |
distillation |
|
|
| 16 |
CYCLE: Cross-Year Contrastive Learning in Entity-Linking |
提出CYCLE以解决实体链接中的时间性能退化问题 |
contrastive learning |
✅ |
|
| 17 |
Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning |
提出Kaleidoscope以解决多智能体强化学习中的策略同质性问题 |
reinforcement learning |
✅ |
|
| 18 |
NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs |
NextLocLLM:利用LLM进行位置语义建模和基于坐标的下一位置预测 |
predictive model spatiotemporal |
✅ |
|