| 1 |
On Pre-training of Multimodal Language Models Customized for Chart Understanding |
提出CHOPINLLM,定制多模态大语言模型以提升图表理解能力 |
large language model multimodal |
|
|
| 2 |
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding |
提出Token级相关性引导压缩方法,提升多模态文档理解效率。 |
large language model multimodal |
|
|
| 3 |
PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding |
PD-APE:一种用于3D视觉定位的自适应位置编码并行解码框架 |
visual grounding |
|
|
| 4 |
Patch-based Intuitive Multimodal Prototypes Network (PIMPNet) for Alzheimer's Disease classification |
PIMPNet:基于Patch的多模态原型网络,用于阿尔茨海默病分类 |
multimodal |
|
|
| 5 |
Visual Text Generation in the Wild |
提出SceneVTG,一种在复杂场景下生成高质量、实用文本图像的视觉文本生成器。 |
large language model multimodal |
|
|
| 6 |
Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance |
提出Semantic-CC,利用基础知识和语义引导提升遥感图像变化描述效果。 |
large language model foundation model |
|
|
| 7 |
EVLM: An Efficient Vision-Language Model for Visual Understanding |
提出EVLM:一种高效的视觉-语言模型,用于提升视觉理解能力 |
large language model |
|
|
| 8 |
Seismic Fault SAM: Adapting SAM with Lightweight Modules and 2.5D Strategy for Fault Detection |
Seismic Fault SAM:利用轻量级模块和2.5D策略改进SAM用于地震断层检测 |
foundation model |
|
|
| 9 |
Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization |
提出基于VLM辅助条件分解的Img2CAD方法,从图像逆向工程3D CAD模型。 |
foundation model |
✅ |
|