| 1 |
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning |
提出可视化参考指令调优方法,提升多模态大语言模型在图表问答任务中的性能。 |
large language model multimodal |
✅ |
|
| 2 |
Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images |
提出结合多模态大语言模型与街景图像的城市安全感知评估方法 |
large language model multimodal |
|
|
| 3 |
Classification of freshwater snails of the genus Radomaniola with multimodal triplet networks |
提出基于多模态Triplet网络的Radomaniola属淡水螺分类方法 |
multimodal |
|
|
| 4 |
Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS |
Twins-PainViT:面向面部视频和fNIRS的多模态疼痛自动评估的模态无关Vision Transformer框架 |
multimodal |
|
|
| 5 |
Diffusion Feedback Helps CLIP See Better |
DIVA:利用扩散模型反馈提升CLIP的细粒度视觉能力 |
large language model multimodal |
✅ |
|
| 6 |
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues |
提出BRIDGE,通过增强视觉线索弥合图像描述评估中的差距 |
multimodal |
✅ |
|
| 7 |
Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing |
提出SANE:利用LLM分解指令,解决文本驱动图像编辑中的歧义性问题 |
large language model |
✅ |
|
| 8 |
FlexAttention for Efficient High-Resolution Vision-Language Models |
FlexAttention:一种高效高分辨率视觉-语言模型注意力机制 |
multimodal |
|
|
| 9 |
Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter |
提出基于注意力机制的模态重加权方法AMR,提升RGB-骨骼动作识别模型的对抗鲁棒性。 |
multimodal |
|
|
| 10 |
Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture |
提出基于GAN合成热成像视频的Vision-MLP疼痛自动评估方法 |
multimodal |
|
|