| 1 |
Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language Models |
提出基于多智能体深度研究的多模态大语言模型,用于多媒体内容验证。 |
large language model multimodal |
|
|
| 2 |
OmniVec2 -- A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning |
OmniVec2:一种用于大规模多模态多任务学习的新型Transformer网络 |
multimodal |
|
|
| 3 |
SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights |
SFOOD:构建大规模多模态食品属性分析基准,融合光谱信息超越RGB局限 |
multimodal |
|
|
| 4 |
ViTaL: A Multimodality Dataset and Benchmark for Multi-pathological Ovarian Tumor Recognition |
提出ViTaL数据集与ViTaL-Net,用于多病理卵巢肿瘤的多模态识别 |
multimodal |
✅ |
|
| 5 |
ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts |
ZERO:面向工业界的多模态提示视觉基础模型,实现零样本泛化 |
foundation model |
|
|
| 6 |
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step |
CoT-Diff:通过链式推理强化文本到图像生成中的空间布局对齐 |
large language model multimodal |
|
|
| 7 |
Computed Tomography Visual Question Answering with Cross-modal Feature Graphing |
提出基于跨模态特征图的CT图像视觉问答框架,提升诊断准确性 |
large language model multimodal |
|
|
| 8 |
SeqTex: Generate Mesh Textures in Video Sequence |
SeqTex:提出一种视频序列中的网格纹理生成方法,实现端到端UV纹理映射。 |
foundation model |
|
|