| 1 |
PainFormer: a Vision Foundation Model for Automatic Pain Assessment |
PainFormer:用于自动疼痛评估的视觉基础模型 |
foundation model multimodal |
✅ |
|
| 2 |
Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs |
提出NeaR,利用MLLM生成标签微调CLIP模型,解决无词汇精细化视觉识别问题。 |
large language model multimodal |
|
|
| 3 |
Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation |
提出基于交叉注意力Transformer的多模态融合方法,用于提升自主航海的安全性。 |
multimodal |
|
|
| 4 |
Grounding Task Assistance with Multimodal Cues from a Single Demonstration |
MICA:利用单次演示中的多模态线索增强任务辅助的对话智能体 |
multimodal |
|
|
| 5 |
Multimodal Doctor-in-the-Loop: A Clinically-Guided Explainable Framework for Predicting Pathological Response in Non-Small Cell Lung Cancer |
提出多模态医生在环框架,用于预测非小细胞肺癌的病理反应。 |
multimodal |
|
|
| 6 |
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging |
基石模型能否有效分割肿瘤?肺部CT影像分割的基准测试 |
foundation model |
|
|
| 7 |
Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation |
提出多模态X光影像与报告生成框架以解决医疗数据生成问题 |
multimodal |
|
|