| 1 |
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment |
提出多示例视觉提示生成器MIVPG,增强多模态大语言模型中的视觉表征 |
large language model multimodal |
|
|
| 2 |
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach |
PlugIR:利用大语言模型实现交互式文本到图像检索,无需微调。 |
large language model instruction following |
✅ |
|
| 3 |
Identification of Stone Deterioration Patterns with Large Multimodal Models |
利用大型多模态模型识别石材劣化模式,助力文化遗产保护 |
multimodal |
|
|
| 4 |
Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI |
提出一种Radiomics引导的多模态自注意力网络,用于预测乳腺MRI病理完全缓解 |
multimodal |
|
|
| 5 |
AD-H: Autonomous Driving with Hierarchical Agents |
提出AD-H:一种基于分层Agent的自动驾驶系统,提升泛化性和可解释性。 |
large language model multimodal |
✅ |
|
| 6 |
DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut |
DiffCut:利用扩散模型特征和递归归一化割催化零样本语义分割 |
foundation model multimodal |
|
|
| 7 |
Exploiting LMM-based knowledge for image classification tasks |
利用LMM知识增强图像分类:融合图像与文本嵌入 |
multimodal |
|
|
| 8 |
Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision |
提出Adapter-X,一种高效通用视觉参数高效微调框架,超越全参数微调。 |
foundation model |
|
|
| 9 |
Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning |
提出AENet,通过语义增强视觉提示提升零样本学习的泛化能力。 |
zero-shot transfer |
|
|
| 10 |
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models |
提出加权视觉-文本交叉对齐方法,提升视觉-语言模型零样本性能 |
large language model |
|
|
| 11 |
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM |
PosterLLaVa:利用多模态大语言模型构建统一的多模态布局生成器 |
large language model |
✅ |
|