| 1 |
Combating Visual Neglect and Semantic Drift in Large Multimodal Models for Enhanced Cross-Modal Retrieval |
提出SSA-ME框架,通过显著性感知建模解决LMMs在跨模态检索中的视觉忽视和语义漂移问题。 |
representation learning contrastive learning multimodal |
|
|
| 2 |
OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding |
提出OmniVTG数据集和自校正CoT训练范式,提升开放世界视频时序定位性能 |
reinforcement learning large language model multimodal |
✅ |
|
| 3 |
TopoMamba: Topology-Aware Scanning and Fusion for Segmenting Heterogeneous Medical Visual Media |
TopoMamba:面向异构医学视觉媒体分割的拓扑感知扫描与融合框架 |
Mamba SSM |
|
|
| 4 |
A Systematic Post-Train Framework for Video Generation |
提出视频生成后训练框架,提升生成质量、时序一致性与指令遵循能力。 |
reinforcement learning RLHF instruction following |
|
|
| 5 |
Improving Diversity in Black-box Few-shot Knowledge Distillation |
提出自适应多样性黑盒少样本知识蒸馏方法,提升学生模型精度 |
distillation |
✅ |
|
| 6 |
Vision SmolMamba: Spike-Guided Token Pruning for Energy-Efficient Spiking State-Space Vision Models |
提出Vision SmolMamba,通过脉冲引导的token剪枝实现高效脉冲状态空间视觉模型 |
Mamba |
|
|
| 7 |
DualGeo: A Dual-View Framework for Worldwide Image Geo-localization |
DualGeo:用于全球图像地理定位的双视角框架,提升定位精度。 |
contrastive learning multimodal |
✅ |
|
| 8 |
The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation |
经典知识蒸馏方法在语义分割任务上表现出惊人的有效性 |
distillation |
|
|
| 9 |
DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing |
DDA-Thinker:解耦双原子强化学习,用于推理驱动的图像编辑 |
reinforcement learning |
|
|