| 1 |
Emergent Morphing Attack Detection in Open Multi-modal Large Language Models |
利用开放多模态大语言模型实现人脸融合攻击的零样本检测 |
large language model multimodal |
|
|
| 2 |
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models |
提出R3框架,解决多模态模型生成与理解能力优化困境 |
multimodal |
✅ |
|
| 3 |
Concept-Enhanced Multimodal RAG: Towards Interpretable and Accurate Radiology Report Generation |
提出概念增强多模态RAG框架CEMRAG,提升放射报告生成的可解释性和准确性 |
multimodal |
|
|
| 4 |
CREMD: Crowd-Sourced Emotional Multimodal Dogs Dataset |
提出CREMD数据集,用于研究不同模态信息和标注者特征对犬类情感识别的影响 |
multimodal |
|
|
| 5 |
Effective and Robust Multimodal Medical Image Analysis |
提出MAIL和Robust-MAIL网络,用于有效且鲁棒的多模态医学图像分析。 |
multimodal |
✅ |
|
| 6 |
Training-Free Zero-Shot Anomaly Detection in 3D Brain MRI with 2D Foundation Models |
提出一种基于2D预训练模型的3D脑MRI无训练零样本异常检测方法 |
foundation model |
|
|
| 7 |
Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation |
提出检索增强框架,提升LLM在视觉-语言导航中的效率与稳定性 |
VLN large language model |
|
|
| 8 |
Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs |
提出PADE:利用内部注意力动态增强视觉核心区域,缓解LVLM幻觉问题 |
multimodal visual grounding |
|
|
| 9 |
VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation |
VideoSketcher:利用预训练视频模型实现多功能序列草图生成 |
large language model |
|
|
| 10 |
Meteorological data and Sky Images meets Neural Models for Photovoltaic Power Forecasting |
结合气象数据、天空图像与深度模型,提升光伏发电功率预测精度 |
multimodal |
|
|
| 11 |
GMAIL: Generative Modality Alignment for generated Image Learning |
GMAIL:生成模态对齐框架,提升生成图像在视觉-语言任务中的利用率 |
multimodal |
|
|
| 12 |
Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs |
Sparrow:面向视频LLM推断加速,提出文本锚定窗口注意力机制 |
large language model |
|
|