| 1 |
Using Multimodal Foundation Models and Clustering for Improved Style Ambiguity Loss |
提出基于多模态基础模型和聚类的风格歧义损失,提升文本到图像生成模型的创造性。 |
foundation model multimodal |
|
|
| 2 |
The Use of Multimodal Large Language Models to Detect Objects from Thermal Images: Transportation Applications |
利用多模态大语言模型从热成像中检测物体,应用于智能交通系统 |
large language model multimodal |
|
|
| 3 |
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models |
综述:基于文本到图像扩散模型的多模态引导图像编辑技术 |
multimodal |
✅ |
|
| 4 |
HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models |
HeartBeat:多模态条件引导的扩散模型,实现可控超声心动图视频合成 |
multimodal |
|
|
| 5 |
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs |
Prism:解耦并评估视觉语言模型能力的框架,提升性能并降低成本 |
large language model multimodal |
✅ |
|
| 6 |
E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion |
提出E-ANT大规模中文GUI导航数据集,促进多模态大模型在移动设备上的应用 |
large language model multimodal |
|
|
| 7 |
Towards Event-oriented Long Video Understanding |
提出Event-Bench基准测试和VIM方法,提升MLLM在事件导向长视频理解能力 |
large language model multimodal |
✅ |
|
| 8 |
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment |
揭示生成式图像描述增强的负面影响:偏见与幻觉问题 |
large language model |
|
|