| 1 |
Mamba-VMR: Multimodal Query Augmentation via Generated Videos for Precise Temporal Grounding |
Mamba-VMR:通过生成视频增强多模态查询,实现精确时序定位 |
Mamba multimodal |
|
|
| 2 |
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation |
提出SpatialReward,提升文本到图像生成中细粒度空间一致性 |
reinforcement learning spatial relationship visual grounding |
|
|
| 3 |
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model |
daVinci-MagiHuman:基于单流Transformer的快速音视频生成基础模型 |
distillation foundation model |
|
|
| 4 |
PPGL-Swarm: Integrated Multimodal Risk Stratification and Hereditary Syndrome Detection in Pheochromocytoma and Paraganglioma |
PPGL-Swarm:用于嗜铬细胞瘤和副神经节瘤的多模态风险分层与遗传综合征检测 |
reinforcement learning multimodal |
|
|
| 5 |
ALADIN:Attribute-Language Distillation Network for Person Re-Identification |
提出ALADIN,通过属性-语言蒸馏网络提升行人重识别的细粒度特征学习能力。 |
representation learning distillation multimodal |
|
|
| 6 |
Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement |
提出VFLM:利用视觉反馈迭代优化文本布局生成,提升可读性和美观性。 |
reinforcement learning large language model multimodal |
✅ |
|
| 7 |
Multi-View Deformable Convolution Meets Visual Mamba for Coronary Artery Segmentation |
提出MDSVM-UNet,结合多视角可变形卷积与视觉Mamba用于冠状动脉分割 |
Mamba SSM state space model |
|
|
| 8 |
Clinical Graph-Mediated Distillation for Unpaired MRI-to-CFI Hypertension Prediction |
提出临床图介导蒸馏方法,用于无配对MRI-眼底图像的高血压预测。 |
distillation multimodal |
✅ |
|
| 9 |
Image-Conditioned Adaptive Parameter Tuning for Visual Odometry Frontends |
提出图像条件自适应参数调整的视觉里程计前端,提升资源受限机器人的性能。 |
reinforcement learning visual odometry |
|
|
| 10 |
A Latent Representation Learning Framework for Hyperspectral Image Emulation in Remote Sensing |
提出基于隐空间表征学习的高光谱图像仿真框架,加速遥感应用开发。 |
representation learning HSI |
|
|
| 11 |
Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation |
提出自适应视频蒸馏框架,解决少步生成中过饱和与时间塌陷问题 |
distillation physically plausible |
|
|
| 12 |
ACPO: Counteracting Likelihood Displacement in Vision-Language Alignment with Asymmetric Constraints |
提出ACPO,通过非对称约束优化解决视觉-语言对齐中的似然漂移问题 |
DPO direct preference optimization multimodal |
|
|
| 13 |
WorldCache: Content-Aware Caching for Accelerated Video World Models |
提出WorldCache,通过感知约束动态缓存加速视频世界模型的推理。 |
world model |
✅ |
|
| 14 |
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models |
Omni-WorldBench:面向交互中心的世界模型综合评估基准 |
world model |
|
|
| 15 |
Manifold-Aware Exploration for Reinforcement Learning in Video Generation |
提出SAGE-GRPO,通过流形感知探索提升视频生成强化学习的稳定性和质量。 |
reinforcement learning |
✅ |
|
| 16 |
Rethinking SAR ATR: A Target-Aware Frequency-Spatial Enhancement Framework with Noise-Resilient Knowledge Guidance |
提出一种目标感知的频域-空域增强框架,提升SAR图像在噪声环境下的目标识别精度。 |
representation learning teacher-student distillation |
|
|
| 17 |
From Part to Whole: 3D Generative World Model with an Adaptive Structural Hierarchy |
提出自适应结构层次的3D生成世界模型,解决单图3D生成中结构复杂性和泛化性问题。 |
world model |
|
|