| 25 |
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy |
提出OneWM-VLA模型,通过单Token帧压缩与流匹配目标优化视觉-语言-动作(VLA)策略的长程规划能力。 |
flow matching world model world models |
|
|
| 26 |
Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training |
提出Sword世界模型框架,通过动态潜在引导与风格增强提升VLA策略训练的鲁棒性 |
reinforcement learning world model world models |
|
|
| 27 |
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation |
提出ST-Gen4D框架,通过引入4D时空认知世界模型实现高一致性的4D生成。 |
world model world models spatiotemporal |
|
|
| 28 |
Learning Visual Feature-Based World Models via Residual Latent Action |
提出基于残差潜在动作(RLA)的世界模型,通过流匹配实现高效视觉特征预测与机器人策略学习。 |
policy learning flow matching world model |
✅ |
|
| 29 |
Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness |
提出Pan-FM:一种基于显著性引导掩码的泛器官基础模型,以解决多模态医学影像中的缺失数据鲁棒性问题。 |
representation learning distillation foundation model |
|
|
| 30 |
ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning |
提出ReasonEdit框架:构建大规模思维链数据集并利用强化学习实现可解释的图像编辑评估 |
reinforcement learning large language model multimodal |
✅ |
|
| 31 |
GEM: Generating LiDAR World Model via Deformable Mamba |
提出GEM:基于可变形Mamba的生成式激光雷达世界模型,实现高保真环境动力学模拟 |
world model world models Mamba |
✅ |
|
| 32 |
Flow-OPD: On-Policy Distillation for Flow Matching Models |
提出Flow-OPD框架,通过策略蒸馏解决流匹配模型多任务对齐中的奖励稀疏与梯度干扰问题。 |
flow matching distillation large language model |
|
|
| 33 |
EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction |
提出EmambaIR,一种高效视觉状态空间模型,用于事件引导的图像重建。 |
Mamba SSM state space model |
✅ |
|
| 34 |
Sat3R: Satellite DSM Reconstruction via RPC-Aware Depth Fine-tuning |
提出Sat3R框架:通过RPC感知深度微调实现高效卫星DSM重建 |
MAE monocular depth metric depth |
|
|
| 35 |
Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers |
提出Diffusion-APO算法,通过轨迹感知直接偏好对齐优化视频扩散模型 |
RLHF DPO direct preference optimization |
|
|
| 36 |
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs |
提出ShellfishNet基准数据集,旨在解决复杂水下环境中贝类物种识别的鲁棒性挑战 |
SSM state space model large language model |
|
|
| 37 |
Breaking Spatial Uniformity: Prior-Guided Mamba with Radial Serialization for Lens Flare Removal |
提出DeflareMambav2:基于径向序列化与先验引导的Mamba架构,实现高效去眩光处理 |
Mamba SSM state space model |
✅ |
|
| 38 |
VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network |
提出VIMCAN混合架构,融合Mamba与交叉注意力机制实现高效视觉-惯性3D人体姿态估计 |
Mamba multimodal |
|
|
| 39 |
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning |
提出BalCapRL框架,通过多目标强化学习优化多模态大模型的图像描述质量 |
reinforcement learning large language model multimodal |
|
|
| 40 |
Neurosymbolic Framework for Concept-Driven Logical Reasoning in Skeleton-Based Human Action Recognition |
提出基于神经符号框架的骨架动作识别方法,实现概念驱动的逻辑推理与可解释性。 |
representation learning motion representation |
✅ |
|
| 41 |
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations |
提出基于预训练扩散模型的单步蒸馏方法,提升生成效率与图像质量。 |
flow matching distillation |
|
|
| 42 |
SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models |
提出SARA框架:通过语义自适应关系对齐提升视频扩散模型的文本遵循能力 |
distillation foundation model |
|
|
| 43 |
RELO: Reinforcement Learning to Localize for Visual Object Tracking |
提出RELO强化学习定位框架,通过奖励驱动替代手工先验以优化视觉目标跟踪 |
reinforcement learning |
|
|
| 44 |
Towards multi-modal forgery representation learning for AI-generated video detection and localization |
提出多模态伪造表示学习框架,用于AI生成视频的检测与定位。 |
representation learning |
|
|
| 45 |
Closed-Form Linear-Probe Dataset Distillation for Pre-trained Vision Models |
提出CLP-DD方法,通过闭式解实现预训练视觉模型的高效数据集蒸馏 |
distillation |
|
|
| 46 |
PRIMED: Adaptive Modality Suppression for Referring Audio-Visual Segmentation via Biased Competition |
提出PRIMED框架,通过偏向竞争机制实现指称视听分割中的自适应模态抑制 |
contrastive learning multimodal |
|
|
| 47 |
Implicit Preference Alignment for Human Image Animation |
提出隐式偏好对齐(IPA)框架,解决人体图像动画中手部动作生成质量难题 |
reinforcement learning direct preference optimization |
✅ |
|