| 1 |
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation |
提出mmMamba,通过蒸馏将多模态大语言模型转化为线性复杂度的状态空间模型。 |
Mamba state space model distillation |
✅ |
|
| 2 |
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization |
提出Re-Align框架,通过检索增强的直接偏好优化对齐视觉语言模型,有效缓解跨模态幻觉问题。 |
reinforcement learning RLHF DPO |
✅ |
|
| 3 |
S2C: Learning Noise-Resistant Differences for Unsupervised Change Detection in Multimodal Remote Sensing Images |
提出S2C框架,利用视觉基础模型和对比学习进行多模态遥感图像的无监督变化检测。 |
contrastive learning foundation model multimodal |
|
|
| 4 |
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm |
RealSyn:一种有效且可扩展的多模态交错文档转换范式,提升对比视觉-语言表征学习。 |
representation learning multimodal zero-shot transfer |
✅ |
|
| 5 |
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image |
CAST:提出组件对齐的单RGB图像三维场景重建方法 |
MAE scene reconstruction penetration |
|
|
| 6 |
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning |
RAD:基于大规模3DGS强化学习的端到端自动驾驶策略训练 |
reinforcement learning imitation learning 3DGS |
✅ |
|
| 7 |
DAMamba: Vision State Space Model with Dynamic Adaptive Scan |
提出动态自适应扫描以解决视觉状态空间模型的局限性 |
Mamba SSM state space model |
✅ |
|
| 8 |
RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation |
RecDreamer通过均匀分数蒸馏解决文本到3D生成中的多面Janus问题 |
dreamer distillation |
|
|
| 9 |
Contrast-Unity for Partially-Supervised Temporal Sentence Grounding |
提出Contrast-Unity框架,解决部分监督时序语句定位问题,降低标注成本。 |
contrastive learning TAMP |
|
|