| 1 |
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image |
提出MOSABench,用于评估多模态大语言模型在多目标情感分析中的图像理解能力。 |
large language model multimodal |
|
|
| 2 |
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering |
提出ReflectiVA,通过自反思tokens增强多模态LLM的知识型视觉问答能力 |
large language model multimodal |
✅ |
|
| 3 |
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models |
Chat2SVG:结合大语言模型与图像扩散模型的矢量图形生成框架 |
large language model |
|
|
| 4 |
ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality Images |
提出ENCLIP,通过集成和聚类提升CLIP在有限数据和低质量图像下的时尚多模态搜索性能。 |
multimodal |
|
|
| 5 |
Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models |
提出DiffuBias,利用潜在扩散模型和大型语言模型增强分类器鲁棒性,解决偏见学习问题。 |
large language model |
|
|
| 6 |
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge |
提出实体增强认知对齐(EECA)方法,解决LVLM中视觉知识与语言模型认知框架的对齐问题。 |
large language model multimodal |
|
|
| 7 |
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation |
MMSLT:利用多模态大语言模型实现无词汇手语翻译 |
large language model multimodal |
✅ |
|
| 8 |
LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation |
提出LaB-RAG,利用标签增强检索增强生成,提升放射报告生成效果。 |
large language model |
|
|
| 9 |
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages |
提出ALM-bench,用于评估LMMs在100种文化多样性语言上的理解和推理能力。 |
multimodal |
|
|