| 1 |
Quantitative Video World Model Evaluation for Geometric-Consistency |
提出PDI-Bench,用于量化评估视频生成模型在几何一致性方面的性能。 |
world model world models physically plausible |
✅ |
|
| 2 |
EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding |
提出EARL框架,用于增强以自我为中心的交互推理和像素级定位 |
reinforcement learning egocentric egocentric vision |
|
|
| 3 |
EponaV2: Driving World Model with Comprehensive Future Reasoning |
EponaV2:提出具备全面未来推理的驾驶世界模型,提升自动驾驶规划能力。 |
flow matching world model world models |
|
|
| 4 |
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer |
SANA-WM:高效分钟级世界模型,基于混合线性扩散Transformer |
world model world models linear attention |
|
|
| 5 |
FactorizedHMR: A Hybrid Framework for Video Human Mesh Recovery |
FactorizedHMR:用于视频人体网格重建的混合框架,提升遮挡和弱深度下的鲁棒性 |
flow matching classifier-free guidance human mesh recovery |
|
|
| 6 |
Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke |
提出Vision-Core引导的对比学习方法,用于平衡多模态卒中预后预测。 |
contrastive learning large language model multimodal |
|
|
| 7 |
SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding |
提出SceneParser,用于交互导向的层级场景解析,提升视觉语义理解 |
curriculum learning scene understanding open-vocabulary |
|
|
| 8 |
MambaRain: Multi-Scale Mamba-Attention Framework for 0-3 Hour Precipitation Nowcasting |
MambaRain:结合Mamba和注意力机制的多尺度降水临近预报框架 |
Mamba representation learning spatiotemporal |
|
|
| 9 |
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation |
提出Causal Forcing++,实现帧级2步自回归扩散蒸馏,加速交互式视频生成。 |
world model world models distillation |
✅ |
|
| 10 |
Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation |
提出基于分层知识蒸馏的语义先验学习方法,稳定红外小目标点监督检测。 |
distillation foundation model |
✅ |
|
| 11 |
EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration |
EverAnimate:通过潜在流恢复实现分钟级人物动画生成 |
flow matching human motion |
|
|
| 12 |
SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition |
SurgicalMamba:基于状态重编程的双路径SSD用于在线手术阶段识别 |
Mamba SSM |
✅ |
|
| 13 |
Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning |
提出CLVR框架,通过闭环验证推理提升复杂视觉生成效果 |
reinforcement learning distillation multimodal |
|
|
| 14 |
RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO |
提出RAVEN以解决长视频生成质量不足的问题 |
reinforcement learning distillation |
|
|
| 15 |
Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation |
提出Delta Forcing,通过自适应信任域指导交互式自回归视频生成,提升时序一致性。 |
world model world models |
|
|
| 16 |
KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration |
KVPO:基于KV语义探索的ODE原生GRPO,用于自回归视频对齐 |
reinforcement learning flow matching |
|
|
| 17 |
PanoWorld: Geometry-Consistent Panoramic Video World Modeling |
PanoWorld:提出几何一致的全景视频世界建模方法,从单张图像和文本生成逼真全景视频。 |
world model world models geometric consistency |
✅ |
|
| 18 |
Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction |
提出交互感知掩码的IA-JEPA模型,用于提升因果视频预测的性能。 |
world model world models JEPA |
|
|
| 19 |
EgoExo-WM: Unlocking Exo Video for Ego World Models |
EgoExo-WM:利用外视角视频增强自视角世界模型 |
world model world models egocentric |
|
|
| 20 |
ReactiveGWM: Steering NPC in Reactive Game World Models |
提出ReactiveGWM,实现游戏中可控NPC的反应式游戏世界建模。 |
world model world models |
|
|
| 21 |
Social-Mamba: Socially-Aware Trajectory Forecasting with State-Space Models |
提出Social-Mamba,利用状态空间模型高效预测人群轨迹,解决社交互动建模难题。 |
flow matching Mamba egocentric |
✅ |
|
| 22 |
Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning |
提出CLVR框架,通过闭环验证推理提升复杂视觉生成效果 |
reinforcement learning distillation multimodal |
|
|