| 1 |
Low-Rank Similarity Mining for Multimodal Dataset Distillation |
提出LoRS,用于多模态数据集蒸馏,解决图像-文本数据对的相似性学习难题。 |
contrastive learning distillation multimodal |
✅ |
|
| 2 |
Strategically Conservative Q-Learning |
提出策略保守Q学习(SCQ)以解决离线强化学习中过度保守的价值估计问题 |
reinforcement learning offline RL offline reinforcement learning |
✅ |
|
| 3 |
Aligning Agents like Large Language Models |
借鉴LLM训练范式,提升3D环境中智能体通用性和鲁棒性 |
reinforcement learning large language model |
✅ |
|
| 4 |
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models |
提出SPAC,一种可证明且可扩展的离线对齐方法,用于语言模型。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 5 |
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning |
提出基于矩匹配的离线模型强化学习算法MOMBO,提升确定性不确定性传播效率 |
reinforcement learning offline reinforcement learning |
|
|
| 6 |
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models |
Chimera:利用二维状态空间模型有效建模多元时间序列 |
Mamba SSM state space model |
|
|
| 7 |
TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification |
TSCMamba:结合多视角学习与Mamba的多元时间序列分类方法 |
Mamba state space model |
|
|
| 8 |
Road Network Representation Learning with the Third Law of Geography |
提出基于地理学第三定律的路网表征学习框架,提升路段表征在下游任务中的性能。 |
representation learning contrastive learning |
|
|
| 9 |
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking |
提出连续动作掩码方法,通过聚焦相关动作空间提升强化学习效率 |
reinforcement learning PPO |
|
|
| 10 |
Open Problem: Active Representation Learning |
提出主动表征学习框架,解决部分可观测环境下的探索与表征学习问题 |
representation learning |
|
|
| 11 |
Mitigating Bias in Dataset Distillation |
提出基于核密度估计的重加权方法,缓解数据集蒸馏中的偏差放大问题 |
distillation |
|
|
| 12 |
ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories |
ATraDiff:利用生成轨迹加速在线强化学习,解决稀疏奖励问题 |
reinforcement learning |
|
|
| 13 |
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment |
提出Spread Preference Annotation,利用少量数据高效对齐LLM |
preference learning large language model |
|
|
| 14 |
What is Dataset Distillation Learning? |
研究数据集蒸馏学习,揭示蒸馏数据特性与信息存储方式 |
distillation |
|
|
| 15 |
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard |
针对多智能体模仿学习,提出基于遗憾差距最小化的MALICE和BLADES算法。 |
imitation learning |
|
|
| 16 |
STEMO: Early Spatio-temporal Forecasting with Multi-Objective Reinforcement Learning |
提出基于多目标强化学习的STEMO模型,用于提前时空预测,平衡准确性和及时性。 |
reinforcement learning |
|
|
| 17 |
Mini Honor of Kings: A Lightweight Environment for Multi-Agent Reinforcement Learning |
提出Mini HoK轻量级环境,促进多智能体强化学习研究与算法创新。 |
reinforcement learning |
✅ |
|
| 18 |
Breeding Programs Optimization with Reinforcement Learning |
提出基于强化学习的育种程序优化方法,提升作物遗传增益。 |
reinforcement learning |
|
|
| 19 |
Towards Dynamic Trend Filtering through Trend Point Detection with Reinforcement Learning |
提出基于强化学习的动态趋势滤波方法,用于捕捉时间序列中的突变趋势。 |
reinforcement learning |
|
|
| 20 |
Transductive Off-policy Proximal Policy Optimization |
提出Transductive Off-policy PPO (ToPPO),提升PPO算法的离线数据利用率 |
reinforcement learning PPO |
|
|
| 21 |
Improving Actor-Critic Training with Steerable Action-Value Approximation Errors |
提出Utility Soft Actor-Critic (USAC),通过可操纵的动作价值近似误差改进Actor-Critic训练。 |
reinforcement learning deep reinforcement learning |
|
|
| 22 |
How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach |
提出CATY-IRL算法,解决线性MDP中大规模状态空间下的逆强化学习问题 |
reinforcement learning inverse reinforcement learning |
|
|
| 23 |
Reflective Policy Optimization |
提出反射策略优化RPO,提升on-policy强化学习的样本效率 |
reinforcement learning PPO |
✅ |
|