| 1 |
Causal Foundation Models: Disentangling Physics from Instrument Properties |
提出因果基础模型,解耦物理现象与仪器特性,提升时间序列泛化性 |
representation learning contrastive learning foundation model |
|
|
| 2 |
Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning |
揭示深度强化学习供应链漏洞,提出组件级和后训练的后门攻击 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
Representation learning with a transformer by contrastive learning for money laundering detection |
提出基于Transformer和对比学习的表示学习方法,用于解决反洗钱检测问题。 |
representation learning contrastive learning |
|
|
| 4 |
2048: Reinforcement Learning in a Delayed Reward Environment |
提出Horizon-DQN,解决2048游戏中延迟奖励下的强化学习问题,显著提升性能。 |
reinforcement learning PPO curriculum learning |
|
|
| 5 |
Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions |
提出基于辅助起始状态分布的加速在线强化学习方法,提升样本效率。 |
reinforcement learning affordance |
|
|
| 6 |
Critiques of World Models |
提出一种基于分层、多级和混合表示的通用世界模型架构,用于实现物理、智能体和嵌套的AGI系统。 |
world model |
|
|
| 7 |
Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation |
提出SEED框架,通过自进化蒸馏缓解大型视觉语言模型中的幻觉问题。 |
distillation |
|
|
| 8 |
Information-Guided Diffusion Sampling for Dataset Distillation |
提出信息引导的扩散采样以解决数据集蒸馏问题 |
distillation |
|
|
| 9 |
wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models |
提出wd1以提升扩散语言模型的推理能力 |
reinforcement learning large language model |
|
|
| 10 |
Replacing thinking with tool usage enables reasoning in small language models |
用工具使用代替思考,使小语言模型具备推理能力 |
reinforcement learning large language model |
|
|
| 11 |
When do World Models Successfully Learn Dynamical Systems? |
提出基于World Models的动力系统学习框架,有效模拟物理系统。 |
world model |
|
|
| 12 |
Going Beyond Heuristics by Imposing Policy Improvement as a Constraint |
提出HEPO算法,通过约束策略提升来有效融合启发式信息,降低人工设计奖励函数的难度。 |
reinforcement learning reward design |
✅ |
|
| 13 |
Interpretable Reward Modeling with Active Concept Bottlenecks |
提出基于主动概念瓶颈的可解释奖励建模框架,提升奖励模型透明度和样本效率。 |
preference learning RLHF |
|
|