| 1 |
Multimodal Deep Learning for Phyllodes Tumor Classification from Ultrasound and Clinical Data |
提出多模态深度学习框架以提高腺瘤分类准确性 |
multimodal |
|
|
| 2 |
Foundation Model-Driven Classification of Atypical Mitotic Figures with Domain-Aware Training Strategies |
提出基于基础模型的分类方法以解决非典型有丝分裂图像识别问题 |
foundation model |
|
|
| 3 |
From Drone Imagery to Livability Mapping: AI-powered Environment Perception in Rural China |
提出视觉-语言对比排名框架以解决农村环境感知问题 |
large language model multimodal chain-of-thought |
|
|
| 4 |
MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning |
提出MM-SeR以解决轻量级图像描述的可靠性问题 |
multimodal |
|
|
| 5 |
Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety |
提出Safe-LLaVA以解决多模态大语言模型的生物特征泄露问题 |
large language model multimodal |
|
|
| 6 |
DriveQA: Passing the Driving Knowledge Test |
提出DriveQA以解决驾驶知识测试的挑战 |
large language model multimodal |
|
|
| 7 |
Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer |
提出多模态融合方法以提升肾癌复发风险预测精度 |
foundation model multimodal |
|
|
| 8 |
Generative AI for Industrial Contour Detection: A Language-Guided Vision System |
提出语言引导的生成视觉系统以解决工业轮廓检测问题 |
multimodal |
|
|
| 9 |
Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments |
提出Waste-Bench以解决复杂环境下VLLMs评估问题 |
large language model |
|
|
| 10 |
Domain Generalization in-the-Wild: Disentangling Classification from Domain-Aware Representations |
提出CLIP-DCA以解决领域泛化评估中的挑战 |
foundation model |
|
|
| 11 |
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR |
提出行级OCR以解决词级OCR的局限性 |
large language model |
✅ |
|
| 12 |
How Well Do Vision--Language Models Understand Cities? A Comparative Study on Spatial Reasoning from Street-View Images |
提出城市空间推理新挑战以提升视觉语言模型性能 |
chain-of-thought |
|
|