| 1 |
First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models |
提出FlowCE,用于多模态大语言模型在流程图理解上的多维度评估 |
large language model multimodal |
✅ |
|
| 2 |
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding |
评估多模态大语言模型在猪仔行为理解中的视觉感知能力,GPT-4o表现突出 |
large language model multimodal |
|
|
| 3 |
Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings |
提出工业语言-图像数据集(ILID),并探索视觉基础模型在工业场景的迁移学习。 |
large language model foundation model multimodal |
|
|
| 4 |
What is the Visual Cognition Gap between Humans and Multimodal LLMs? |
提出MaRs-VQA数据集,评估多模态大语言模型在视觉认知推理方面的能力 |
large language model multimodal |
|
|
| 5 |
BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation |
BrainSegFounder:面向神经影像分割的三维医学影像分割基础模型 |
foundation model |
✅ |
|
| 6 |
Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding |
提出Pun Rebus Art Dataset,用于提升视觉-语言模型对中国文化语境下艺术的理解能力。 |
multimodal |
|
|
| 7 |
SmartRSD: An Intelligent Multimodal Approach to Real-Time Road Surface Detection for Safe Driving |
SmartRSD:提出一种智能多模态方法,用于道路表面实时检测以提升驾驶安全。 |
multimodal |
|
|
| 8 |
Localizing Events in Videos with Multimodal Queries |
提出ICQ基准和多模态查询适配方法,用于视频事件定位任务 |
multimodal |
|
|
| 9 |
ProtoS-ViT: Visual foundation models for sparse self-explainable classifications |
提出ProtoS-ViT以解决稀疏自解释分类问题 |
foundation model |
✅ |
|
| 10 |
SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions |
SemanticSpray++:提出用于湿滑路面自动驾驶的多模态数据集 |
multimodal |
|
|
| 11 |
Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation |
结合视觉基础模型与无监督域自适应提升语义分割性能与效率 |
foundation model |
|
|
| 12 |
AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming |
AnimalFormer:用于行为分析的精准畜牧多模态视觉框架 |
multimodal |
|
|
| 13 |
MoME: Mixture of Multimodal Experts for Cancer Survival Prediction |
提出MoME模型,通过多模态专家混合解决癌症生存预测中异构数据融合问题 |
multimodal |
✅ |
|
| 14 |
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models |
提出Med-HallMark医学幻觉检测基准与MediHall Score评估指标,并构建MediHallDetector模型。 |
large language model multimodal |
|
|