| 1 |
Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance |
提出推理增强的领域自适应多模态大语言模型预训练方法,用于短视频内容治理 |
large language model multimodal chain-of-thought |
|
|
| 2 |
Instruction-tuned Self-Questioning Framework for Multimodal Reasoning |
提出基于指令调优的自问框架SQ-InstructBLIP,用于增强多模态推理能力 |
large language model multimodal |
|
|
| 3 |
X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning |
提出X-CoT,利用LLM链式思考推理实现可解释的文本到视频检索 |
chain-of-thought |
✅ |
|
| 4 |
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding |
VideoJudge:通过自举法实现MLLM作为视频理解评判器的可扩展监督 |
large language model multimodal chain-of-thought |
|
|
| 5 |
A Sentinel-3 foundation model for ocean colour |
提出基于Sentinel-3的海洋颜色基础模型,提升海洋观测任务性能 |
foundation model |
|
|
| 6 |
Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations |
Decipher-MR:用于3D MRI表征的视觉-语言基础模型 |
foundation model |
|
|
| 7 |
CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models |
提出CompareBench,用于评估视觉语言模型中的视觉比较推理能力 |
multimodal |
|
|