| 1 |
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine |
MedTrinity-25M:大规模多模态医学数据集,支持多粒度标注与医学AI模型预训练。 |
large language model foundation model multimodal |
|
|
| 2 |
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI |
GMAI-MMBench:构建综合性多模态医学评估基准,推动通用医学AI发展 |
multimodal |
|
|
| 3 |
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline |
提出野外多模态植物病害识别数据集与多原型融合基线模型,解决类间差异小、类内差异大的难题。 |
multimodal |
|
|
| 4 |
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning |
提出基于LLM神经元调优的统一多模态框架,解决多任务通用性问题。 |
multimodal |
|
|
| 5 |
WWW: Where, Which and Whatever Enhancing Interpretability in Multimodal Deepfake Detection |
提出FakeMix基准与新指标,提升多模态Deepfake检测在动态场景下的可解释性。 |
multimodal |
|
|
| 6 |
Targeted Visual Prompting for Medical Visual Question Answering |
提出靶向视觉提示方法,提升医疗视觉问答中多模态大语言模型的区域理解能力 |
large language model multimodal |
✅ |
|
| 7 |
Set2Seq Transformer: Temporal and Positional-Aware Set Representations for Sequential Multiple-Instance Learning |
提出Set2Seq Transformer,用于序列多示例学习中的时序和位置感知集合表示。 |
multimodal |
|
|
| 8 |
LLaVA-OneVision: Easy Visual Task Transfer |
LLaVA-OneVision:实现单模型在图像、多图和视频场景下的视觉任务迁移 |
multimodal |
|
|
| 9 |
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models |
提出一种样本无关的对抗扰动方法,提升视觉-语言预训练模型的安全性。 |
multimodal |
✅ |
|