| 23 |
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models |
提出Med-RwR框架,通过主动检索增强医学多模态大语言模型的推理能力 |
reinforcement learning large language model multimodal |
✅ |
|
| 24 |
CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder |
CovMatch:通过跨协方差引导和可训练文本编码器实现多模态数据集蒸馏 |
contrastive learning distillation multimodal |
|
|
| 25 |
Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models |
提出VFM-VAE,直接利用视觉基础模型作为潜在扩散模型的tokenizer,显著提升生成质量与效率。 |
distillation foundation model |
|
|
| 26 |
Activating Visual Context and Commonsense Reasoning through Masked Prediction in VLMs |
提出基于掩码预测的上下文常识激活方法,提升视觉语言模型在多模态场景下的推理能力 |
reinforcement learning large language model multimodal |
|
|
| 27 |
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback |
提出Diffusion-DRO,通过排序优化和在线负样本提升扩散模型的用户偏好对齐。 |
reinforcement learning inverse reinforcement learning preference learning |
✅ |
|
| 28 |
OmniNWM: Omniscient Driving Navigation World Models |
OmniNWM:全知全景导航世界模型,赋能自动驾驶 |
world model metric depth |
✅ |
|
| 29 |
Embodied Navigation with Auxiliary Task of Action Description Prediction |
提出基于动作描述预测辅助任务的具身导航方法,提升导航性能和可解释性。 |
reinforcement learning distillation multimodal |
|
|
| 30 |
UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning |
提出UniHPR,通过奇异值对比学习统一多模态人体姿态表征 |
representation learning contrastive learning |
|
|
| 31 |
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents |
提出视觉中心对比学习VC2L,统一处理多模态网页文档的表示学习。 |
representation learning contrastive learning multimodal |
✅ |
|
| 32 |
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder |
ProCLIP:提出基于LLM嵌入的渐进式视觉-语言对齐框架,提升CLIP处理长文本能力。 |
contrastive learning curriculum learning distillation |
✅ |
|