| 1 |
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models |
SPORTU:一个用于评估多模态大语言模型在体育理解能力上的综合基准 |
large language model multimodal chain-of-thought |
|
|
| 2 |
Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor |
提出一种基于预训练3D模型的免训练适配器,解决3D点云少样本增量学习问题 |
foundation model |
✅ |
|
| 3 |
MiRAGeNews: Multimodal Realistic AI-Generated News Detection |
提出MiRAGeNews数据集和MiRAGe检测器,用于检测AI生成的多模态新闻内容 |
multimodal |
|
|
| 4 |
Movie Trailer Genre Classification Using Multimodal Pretrained Features |
提出一种基于多模态预训练特征的电影预告片类型分类新方法 |
multimodal |
|
|
| 5 |
A foundation model for generalizable disease diagnosis in chest X-ray images |
CXRBase:用于胸部X光图像疾病诊断的通用基础模型 |
foundation model |
|
|
| 6 |
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping |
提出VLB动态多模态评估框架,解决LVLM评估的数据污染和复杂度固定问题 |
multimodal |
|
|
| 7 |
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding |
提出VERIFIED,一个用于细粒度视频理解的视频片段检索基准。 |
large language model foundation model multimodal |
✅ |
|
| 8 |
Can GPTs Evaluate Graphic Design Based on Design Principles? |
研究GPT在平面设计评估中的能力,对比设计原则启发式评估与人类标注。 |
foundation model multimodal |
✅ |
|
| 9 |
Hespi: A pipeline for automatically detecting information from hebarium specimen sheets |
Hespi:一种自动检测植物标本信息的数据提取流水线 |
large language model multimodal |
|
|
| 10 |
Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers |
提出Chain-of-Restoration,实现多任务图像复原模型零样本逐步通用图像复原 |
large language model chain-of-thought |
|
|
| 11 |
Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images |
利用SAM 2实现零样本瞳孔分割,在超1400万图像上达到媲美专用模型的性能 |
foundation model |
|
|
| 12 |
Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion |
提出基于多模态融合的Q分布预测方法,提升受控核聚变预测精度。 |
multimodal |
|
|