| 1 |
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models |
SPARC:多模态大语言模型中用于精细图像描述的选择性渐进式注意力重校准 |
large language model multimodal |
|
|
| 2 |
Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective |
综述视觉-语言大模型训练范式,聚焦参数高效的模态融合方法 |
large language model multimodal |
|
|
| 3 |
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting |
提出基于Foundation Model的苹果成熟度与尺寸估计方法,用于选择性采摘。 |
foundation model |
|
|
| 4 |
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models |
利用大规模鲁棒图像编码器提升多模态大语言模型对抗攻击的鲁棒性 |
large language model |
✅ |
|
| 5 |
AdaSVD: Adaptive Singular Value Decomposition for Large Language Models |
提出AdaSVD以解决大语言模型的压缩与性能问题 |
large language model |
✅ |
|
| 6 |
Language-to-Space Programming for Training-Free 3D Visual Grounding |
提出LaSP,一种无需训练的3D视觉定位方法,提升效率与精度。 |
visual grounding |
|
|
| 7 |
The in-context inductive biases of vision-language models differ across modalities |
研究视觉-语言模型在不同模态下的上下文归纳偏置差异 |
foundation model |
|
|