| 1 |
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding |
提出Insect-LLaVA,用于视觉昆虫理解的多模态基础模型与数据集 |
foundation model multimodal |
|
|
| 2 |
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence |
Granite Vision:轻量级开源多模态模型,专为企业智能设计 |
large language model multimodal instruction following |
✅ |
|
| 3 |
V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models |
提出V2V-LLM以解决车辆间协作自动驾驶中的感知与规划问题 |
large language model |
✅ |
|
| 4 |
PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation |
PolyPath:利用大型多模态模型进行多切片病理报告生成 |
multimodal |
|
|
| 5 |
TSP3D: Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding |
提出文本引导的稀疏体素剪枝TSP3D,用于高效的3D视觉定位 |
visual grounding |
✅ |
|
| 6 |
Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling |
提出注意力引导的概念模型(AGCM),用于可解释的多模态人类行为建模。 |
multimodal |
|
|
| 7 |
KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models |
提出KKA:利用大语言模型的异常相关知识提升视觉异常检测性能 |
large language model |
✅ |
|
| 8 |
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations |
综述性研究:全面分析大视觉语言模型(LVLM)的安全性,涵盖攻击、防御与评估。 |
multimodal |
✅ |
|
| 9 |
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types |
TaskGalaxy:通过数万种视觉任务类型扩展多模态指令微调 |
multimodal |
✅ |
|