| 1 |
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders |
揭示多视觉编码器MLLM中的冗余性,提出利用率和信息差距指标进行诊断。 |
large language model multimodal |
|
|
| 2 |
ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays |
ChestGPT:融合LLM与ViT的胸部X光疾病检测与定位框架 |
large language model |
|
|
| 3 |
Sign Spotting Disambiguation using Large Language Models |
提出一种基于大语言模型的无训练手语识别歧义消除框架,提升手语识别质量。 |
large language model |
|
|
| 4 |
Dynamic Multimodal Prototype Learning in Vision-Language Models |
提出ProtoMM,通过动态多模态原型学习提升视觉-语言模型在测试时自适应的性能。 |
multimodal |
|
|
| 5 |
Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation |
Causal-SAM-LLM:利用大语言模型进行因果推理,提升医学分割的鲁棒性 |
large language model |
|
|
| 6 |
Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding |
提出基于跨注意力GRU的多模态对齐框架,用于细粒度视频理解 |
multimodal |
|
|
| 7 |
MolVision: Molecular Property Prediction with Vision Language Models |
MolVision:利用视觉语言模型进行分子性质预测,提升预测性能和泛化能力。 |
large language model multimodal |
✅ |
|
| 8 |
Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor |
提出全局对齐和CLIP相似度指标,用于评估视觉描述符的质量,超越传统准确率。 |
foundation model |
|
|
| 9 |
Unlearning the Noisy Correspondence Makes CLIP More Robust |
提出NCU框架,通过解耦噪声关联提升CLIP模型的鲁棒性 |
zero-shot transfer |
|
|