| 1 |
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective |
综述:面向视觉基础模型的自回归统一理解与生成方法 |
large language model foundation model |
✅ |
|
| 2 |
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation |
提出多模态融合的MM-FSS网络,解决少样本3D点云语义分割问题。 |
multimodal |
✅ |
|
| 3 |
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising |
ContextIQ:一种基于多模态专家的上下文广告视频检索系统 |
multimodal |
|
|
| 4 |
A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection |
综述:面向非监督工业图像异常检测的RGB、3D和多模态方法 |
multimodal |
✅ |
|
| 5 |
Enhanced Survival Prediction in Head and Neck Cancer Using Convolutional Block Attention and Multimodal Data Fusion |
提出基于CBAM和多模态融合的深度学习模型,提升头颈癌生存预测精度。 |
multimodal |
|
|
| 6 |
Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data |
提出一种自合成方法,利用类人认知发展方式训练视觉-语言模型。 |
large language model multimodal |
|
|
| 7 |
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration |
VL-Cache:针对视觉-语言模型推理加速的稀疏性和模态感知KV缓存压缩方法 |
large language model |
|
|
| 8 |
A Lightweight Dual-Branch System for Weakly-Supervised Video Anomaly Detection on Consumer Edge Devices |
提出RuleVAD,一种轻量级双分支系统,用于消费级边缘设备上的弱监督视频异常检测。 |
multimodal |
|
|