| 1 |
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization |
提出ForgeryGPT,利用多模态大语言模型实现可解释的图像伪造检测与定位。 |
large language model multimodal instruction following |
|
|
| 2 |
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing |
提出X-Fi:一种模态不变的基础模型,用于多模态人体感知。 |
foundation model multimodal |
|
|
| 3 |
TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning |
提出TWIST & SCOUT框架,通过无遗忘调优提升MLLM的视觉定位能力 |
large language model multimodal visual grounding |
|
|
| 4 |
EchoApex: A General-Purpose Vision Foundation Model for Echocardiography |
EchoApex:用于超声心动图的通用视觉基础模型 |
foundation model |
|
|
| 5 |
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models |
TemporalBench:用于多模态视频模型细粒度时序理解的基准测试 |
multimodal |
|
|
| 6 |
Towards Foundation Models for 3D Vision: How Close Are We? |
提出UniQA-3D基准测试,评估并提升3D视觉基础模型能力 |
foundation model |
✅ |
|
| 7 |
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes |
提出CAFuser,一种条件感知多模态融合方法,提升驾驶场景语义感知鲁棒性。 |
multimodal |
✅ |
|
| 8 |
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks |
MEGA-Bench:构建包含500+真实世界任务的多模态评估基准,覆盖广泛应用场景。 |
multimodal |
|
|
| 9 |
Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection |
提出IMBALMED,通过类平衡多样性多模态集成方法,用于阿尔茨海默病早期诊断。 |
multimodal |
|
|
| 10 |
Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification |
提出基于多模态数据融合的深度学习模型,用于提升乳腺癌分类性能 |
multimodal |
|
|
| 11 |
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models |
提出MMIE大规模多模态交错理解基准,用于评估大型视觉语言模型 |
multimodal |
✅ |
|
| 12 |
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content |
提出LiveXiv:一个基于ArXiv论文内容的多模态实时评测基准,用于评估大型多模态模型。 |
foundation model |
|
|
| 13 |
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation |
提出SpatialSonic模型,实现语言驱动的沉浸式空间音频生成。 |
multimodal |
|
|
| 14 |
Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework |
提出生成式迁移学习框架GTL,解决跨模态少样本学习问题 |
multimodal |
|
|
| 15 |
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer |
提出MoTE框架,平衡视频识别中的泛化能力与特定任务性能。 |
foundation model |
✅ |
|
| 16 |
Hybrid Transformer for Early Alzheimer's Detection: Integration of Handwriting-Based 2D Images and 1D Signal Features |
提出一种混合Transformer模型,融合手写体图像与信号特征,用于阿尔茨海默病早期检测。 |
multimodal |
|
|
| 17 |
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation |
提出空间感知高效投影器SAEP,通过多层特征聚合提升MLLM效率与空间理解能力。 |
multimodal |
|
|