| 1 |
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model |
ChatTracker:利用多模态大语言模型提升视觉跟踪性能 |
large language model multimodal |
|
|
| 2 |
KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension |
提出KptLLM,利用大语言模型进行关键点语义理解,解决像素级语义细节捕捉难题。 |
large language model multimodal chain-of-thought |
|
|
| 3 |
Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models |
Digi2Real:利用人脸基础模型弥合合成数据人脸识别的真实感差距 |
foundation model |
|
|
| 4 |
A Novel Deep Learning Tractography Fiber Clustering Framework for Functionally Consistent White Matter Parcellation Using Multimodal Diffusion MRI and Functional MRI |
提出Deep Multi-view Fiber Clustering (DMVFC)框架,用于功能一致的白质分割。 |
multimodal |
|
|
| 5 |
3D Audio-Visual Segmentation |
提出EchoSegnet,解决3D场景中基于声音的物体分割问题。 |
embodied AI foundation model |
✅ |
|
| 6 |
Multi-Transmotion: Pre-trained Model for Human Motion Prediction |
Multi-Transmotion:用于人体运动预测的跨模态预训练模型 |
multimodal |
✅ |
|
| 7 |
Adaptive Length Image Tokenization via Recurrent Allocation |
提出基于循环分配的自适应长度图像Token化方法,提升视觉系统表征效率。 |
large language model |
|
|
| 8 |
AM Flow: Adapters for Temporal Processing in Action Recognition |
提出AM Flow和时间处理适配器,提升图像模型在动作识别中的时序建模能力。 |
foundation model |
|
|
| 9 |
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities |
SPECTRUM:提出一种融合语义处理和情感信息的视频字幕生成框架。 |
multimodal |
|
|
| 10 |
Learning Where to Edit Vision Transformers |
提出基于超网络的ViT编辑方法,提升模型在子群体偏移下的泛化性和局部性。 |
large language model |
✅ |
|