| 13 |
Next Forcing: Causal World Modeling with Multi-Chunk Prediction |
提出Next Forcing以解决视频生成训练慢和推理效率低的问题 |
world model world models world action model |
|
|
| 14 |
Can Image Models Imagine Time? ImageTime: A Novel Benchmark for Probing Visual World Modeling Through Spatiotemporal Consistency |
提出ImageTime基准以解决视觉世界建模中的时序一致性问题 |
world model world models spatiotemporal |
|
|
| 15 |
LAFP: Preserving Latent Action Structure in Latent Policy Learning via Flow Matching |
提出LAFP以解决多模态动作分布崩溃问题 |
policy learning imitation learning behavior cloning |
|
|
| 16 |
ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations |
提出ARM模型以统一图像理解、生成与编辑任务 |
reinforcement learning multimodal |
✅ |
|
| 17 |
Mean Flow Distillation: Robust and Stable Distillation for Flow Matching Models |
提出均流蒸馏以解决流匹配模型的计算开销问题 |
flow matching distillation |
|
|
| 18 |
SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning |
提出SCAIL-2以解决受限角色动画中的信息损失问题 |
DPO character animation |
✅ |
|
| 19 |
Kwai Keye-VL-2.0 Technical Report |
提出Kwai Keye-VL-2.0以解决长视频理解和智能体协作问题 |
distillation foundation model multimodal |
|
|
| 20 |
FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model |
提出FADA以解决低收入国家产前超声检查人员短缺问题 |
MAE distillation foundation model |
✅ |
|
| 21 |
Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio |
提出SAM-Audio以解决音频视觉增量学习中的遗忘问题 |
distillation multimodal |
|
|
| 22 |
Efficient RWKV-based Representation Learning for 3D Point Clouds |
提出P-RWKV以解决3D点云表示学习中的局部几何结构捕捉问题 |
representation learning |
|
|
| 23 |
Benchmarking stereo reconstruction for 3D printable Martian terrain models |
提出立体重建方法以解决火星地形建模挑战 |
MAE stereo depth |
|
|