| 1 |
Can Large Language Models Capture Video Game Engagement? |
评估大型语言模型在视频游戏中捕捉玩家参与度的能力 |
large language model multimodal |
|
|
| 2 |
ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models |
ZISVFM:利用视觉基础模型实现室内机器人零样本物体实例分割 |
foundation model |
✅ |
|
| 3 |
DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation |
DILLEMA:利用扩散模型和大型语言模型进行多模态数据增强,提升深度学习模型鲁棒性 |
large language model |
|
|
| 4 |
Driver Assistance System Based on Multimodal Data Hazard Detection |
提出基于多模态数据融合的驾驶辅助系统,提升驾驶异常事件检测精度。 |
multimodal |
|
|
| 5 |
RadVLM: A Multitask Conversational Vision-Language Model for Radiology |
RadVLM:用于放射学的多任务对话式视觉-语言模型 |
foundation model visual grounding |
|
|
| 6 |
Expertized Caption Auto-Enhancement for Video-Text Retrieval |
提出专家化字幕自动增强方法,解决视频-文本检索中信息不匹配问题。 |
large language model multimodal |
✅ |
|
| 7 |
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding |
提出MaxInfo,一种免训练的关键帧选择方法,提升视频理解能力 |
large language model |
✅ |
|
| 8 |
Tell2Reg: Establishing spatial correspondence between images by the same language prompts |
Tell2Reg:利用相同语言提示在图像间建立空间对应关系,实现免训练图像配准 |
multimodal |
✅ |
|