| 1 |
EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models |
EAGLE:增强视觉基础能力,最小化指令型多模态模型中的幻觉问题 |
large language model multimodal visual grounding |
|
|
| 2 |
Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild |
提出Socratic Questioning框架,提升多模态LLM在复杂视觉推理中的性能。 |
large language model multimodal chain-of-thought |
|
|
| 3 |
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets |
CM3T:一种高效的多模态学习框架,用于异构交互数据集。 |
multimodal |
|
|
| 4 |
MObI: Multimodal Object Inpainting Using Diffusion Models |
MObI:提出基于扩散模型的多模态物体填充框架,用于自动驾驶场景数据增强。 |
multimodal |
|
|
| 5 |
MVP: Multimodal Emotion Recognition based on Video and Physiological Signals |
提出MVP模型,融合视频与生理信号,提升长时序情感识别性能 |
multimodal |
|
|
| 6 |
FoundPAD: Foundation Models Reloaded for Face Presentation Attack Detection |
FoundPAD:利用重载的基础模型进行人脸呈现攻击检测 |
foundation model |
✅ |
|
| 7 |
Large Language Models for Video Surveillance Applications |
提出基于视觉语言模型的视频监控摘要生成方法,提升分析精度和效率。 |
large language model |
|
|
| 8 |
Visual Large Language Models for Generalized and Specialized Applications |
综述视觉大语言模型在通用和专用场景下的应用,并探讨其挑战与未来方向 |
large language model |
✅ |
|
| 9 |
Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging? |
Ultrasound-QBench:利用多模态大语言模型辅助超声图像质量评估 |
large language model multimodal |
|
|
| 10 |
SAM-EM: Real-Time Segmentation for Automated Liquid Phase Transmission Electron Microscopy |
SAM-EM:用于自动化液相透射电子显微镜的实时分割方法 |
foundation model |
✅ |
|
| 11 |
CAT: Content-Adaptive Image Tokenization |
提出内容自适应图像Token化方法CAT,提升图像重建和生成效果。 |
large language model |
|
|
| 12 |
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild |
SceneVTG++:提出可控多语言场景视觉文本生成方法,解决自然场景图像文本生成难题 |
large language model |
|
|
| 13 |
MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs |
提出MDP3以解决视频大语言模型中的帧选择问题 |
large language model |
|
|
| 14 |
Found in Translation: semantic approaches for enhancing AI interpretability in face verification |
提出基于语义概念的XAI框架,提升人脸验证模型的可解释性 |
large language model |
|
|