| 1 |
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding |
提出Omni-Weather统一多模态模型,解决天气生成与理解分离的问题。 |
foundation model multimodal chain-of-thought |
|
|
| 2 |
TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References |
TrackTeller:提出时序多模态3D定位方法,解决行为依赖的对象指代问题 |
multimodal language conditioned |
|
|
| 3 |
Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models |
提出Scene-VLM,利用视觉-语言模型进行多模态视频场景分割,显著提升长视频理解能力。 |
multimodal |
|
|
| 4 |
A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets |
提出A-QCF-Net,解决非配对CT/MRI肝脏肿瘤分割问题,实现跨模态知识迁移。 |
multimodal |
|
|
| 5 |
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture |
UniPercept:面向美学、质量、结构和纹理的统一感知级图像理解框架 |
large language model multimodal visual grounding |
|
|
| 6 |
Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification |
冻结编码器的参数高效训练提升多模态胸部X光分类性能 |
multimodal |
|
|
| 7 |
The Illusion of Clinical Reasoning: A Benchmark Reveals the Pervasive Gap in Vision-Language Models for Clinical Competency |
提出B&J骨科临床推理基准,揭示视觉-语言模型在临床能力上的显著差距 |
large language model foundation model multimodal |
|
|
| 8 |
TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant |
提出TAME框架及LCMP基准,解决多模态大语言模型在个性化长程对话中的难题。 |
large language model multimodal |
|
|
| 9 |
FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound |
提出Fetal-Gauge胎儿超声视觉-语言基准,评估并提升VLM在产前诊断中的性能。 |
multimodal visual grounding |
|
|
| 10 |
LLM-Free Image Captioning Evaluation in Reference-Flexible Settings |
提出无LLM的图像描述评估指标Pearl,提升参考灵活场景下的评估性能 |
large language model |
|
|
| 11 |
Hierarchy-Aware Fine-Tuning of Vision-Language Models |
提出层级感知微调框架,高效提升视觉-语言模型在层级分类任务上的性能。 |
multimodal |
|
|