| 1 |
LISA: A Layer-wise Integration and Suppression Approach for Hallucination Mitigation in Multimodal Large Language Models |
提出LISA,通过层级集成与抑制缓解多模态大语言模型中的幻觉问题 |
large language model multimodal visual grounding |
✅ |
|
| 2 |
Object-centric Video Question Answering with Visual Grounding and Referring |
提出基于视觉定位和指代的面向对象视频问答VideoLLM模型 |
large language model multimodal visual grounding |
✅ |
|
| 3 |
ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions |
提出ChartM$^3$基准,用于评估多模态指令下的图表编辑能力,并构建训练集提升模型性能。 |
large language model multimodal |
✅ |
|
| 4 |
A Survey of Multimodal Hallucination Evaluation and Detection |
综述多模态幻觉评估与检测方法,涵盖图像到文本和文本到图像生成任务。 |
large language model multimodal |
|
|
| 5 |
DeepJIVE: Learning Joint and Individual Variation Explained from Multimodal Data Using Deep Learning |
DeepJIVE:提出一种基于深度学习的多模态数据联合与个体差异解释方法 |
multimodal |
|
|
| 6 |
BEV-LLM: Leveraging Multimodal BEV Maps for Scene Captioning in Autonomous Driving |
BEV-LLM:利用多模态BEV地图进行自动驾驶场景描述 |
multimodal |
|
|
| 7 |
BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection |
BridgeNet:用于桥接2D和3D工业异常检测的统一多模态框架 |
multimodal |
✅ |
|
| 8 |
Probing Multimodal Fusion in the Brain: The Dominance of Audiovisual Streams in Naturalistic Encoding |
利用视听优势,探究自然场景下大脑多模态融合的神经编码机制。 |
multimodal |
|
|
| 9 |
MedIQA: A Scalable Foundation Model for Prompt-Driven Medical Image Quality Assessment |
MedIQA:用于提示驱动的医学图像质量评估的可扩展基础模型 |
foundation model |
|
|
| 10 |
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents |
MMBench-GUI:用于GUI智能体的分层多平台评估框架,提升自动化效率。 |
visual grounding |
✅ |
|
| 11 |
Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes |
提出MuStD网络,融合LiDAR和RGB数据,提升室外场景3D目标检测精度。 |
multimodal |
✅ |
|
| 12 |
Closing the Modality Gap for Mixed Modality Search |
提出GR-CLIP以消除CLIP在混合模态搜索中的模态差异 |
multimodal |
|
|