| 1 |
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning |
提出基于大语言模型的球员中心多模态提示生成网络,用于身份感知篮球视频描述 |
large language model multimodal |
✅ |
|
| 2 |
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios |
首个多模态长上下文Token压缩综述,涵盖图像、视频与音频 |
large language model multimodal |
|
|
| 3 |
Can Foundation Models Predict Fitness for Duty? |
利用虹膜图像和预训练模型预测人员是否适合工作 |
foundation model |
|
|
| 4 |
ModalFormer: Multimodal Transformer for Low-Light Image Enhancement |
提出ModalFormer以解决低光照图像增强问题 |
multimodal |
✅ |
|
| 5 |
L-MCAT: Unpaired Multimodal Transformer with Contrastive Attention for Label-Efficient Satellite Image Classification |
L-MCAT:面向弱监督卫星图像分类的对比注意力多模态Transformer |
multimodal |
|
|
| 6 |
MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification |
MIRepNet:面向脑电运动想象分类的专用预训练模型与流程 |
foundation model |
✅ |
|
| 7 |
Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models |
提出MECo框架,利用大语言模型实现运动示例控制的伴随语音手势生成。 |
large language model |
✅ |
|
| 8 |
Trust the Model: Compact VLMs as In-Context Judges for Image-Text Data Quality |
提出一种基于小型VLM的图像-文本数据质量过滤框架,提升训练数据质量。 |
large language model multimodal |
✅ |
|
| 9 |
SAMwave: Wavelet-Driven Feature Enrichment for Effective Adaptation of Segment Anything Model |
SAMwave:利用小波变换增强特征,有效提升SAM模型在复杂任务上的适应性 |
foundation model |
|
|