| 1 |
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection |
提出VMAD:视觉增强的多模态大语言模型用于零样本异常检测 |
large language model multimodal |
|
|
| 2 |
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning |
MM1.5:通过数据驱动的多模态LLM微调提升图像理解与多图像推理能力 |
large language model multimodal |
|
|
| 3 |
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval |
提出LECCR,利用多模态LLM增强跨语言跨模态检索中的视觉和非英语表示对齐。 |
large language model multimodal |
✅ |
|
| 4 |
Multimodal Alignment of Histopathological Images Using Cell Segmentation and Point Set Matching for Integrative Cancer Analysis |
提出基于细胞分割和点集匹配的多模态组织病理图像配准方法,用于癌症整合分析。 |
multimodal |
|
|
| 5 |
AI Foundation Model for Heliophysics: Applications, Design, and Implementation |
面向日球物理学设计AI基础模型,利用SDO数据集探索应用 |
foundation model |
|
|
| 6 |
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection |
OpenKD:开放提示多样性,实现零样本和少样本关键点检测 |
large language model foundation model multimodal |
✅ |
|
| 7 |
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration |
UniKE:通过增强知识协作实现统一的多模态编辑 |
multimodal |
✅ |
|
| 8 |
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering |
提出World to Code,通过自指导组合式描述和过滤生成高质量多模态数据,提升视觉语言模型性能。 |
multimodal visual grounding |
✅ |
|
| 9 |
Visual Context Window Extension: A New Perspective for Long Video Understanding |
提出视觉上下文窗口扩展方法,解决大模型在长视频理解中的难题 |
large language model multimodal |
|
|
| 10 |
Exploring Social Media Image Categorization Using Large Models with Different Adaptation Methods: A Case Study on Cultural Nature's Contributions to People |
提出FLIPS数据集,并探索大模型在社交媒体图像分类中的应用,聚焦文化自然贡献 |
large language model |
|
|
| 11 |
MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans |
提出MM-Conv多模态对话数据集,用于增强虚拟人协同姿势生成。 |
multimodal |
|
|
| 12 |
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos |
VidAssist:利用LLM进行教学视频中面向目标的规划 |
large language model |
|
|
| 13 |
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer |
提出ACE:基于扩散Transformer的通用图像生成与编辑模型 |
large language model |
✅ |
|