| 1 |
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency |
提出MME-CoT基准以评估多模态模型的推理能力 |
large language model multimodal chain-of-thought |
|
|
| 2 |
On the robustness of multimodal language model towards distractions |
评估多模态语言模型在视觉和文本干扰下的鲁棒性,并提出缓解策略。 |
multimodal |
|
|
| 3 |
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models |
ZeroBench:为当代大型多模态模型设计的、不可能完成的视觉推理基准测试。 |
multimodal |
|
|
| 4 |
Multimodal HIE Lesion Segmentation in Neonates: A Comparative Study of Loss Functions |
针对新生儿脑缺血缺氧性脑病病灶分割,提出优化的复合损失函数。 |
multimodal |
|
|
| 5 |
Exploring the Potential of Encoder-free Architectures in 3D LMMs |
提出ENEL:首个无编码器的3D大语言模型,提升3D场景理解能力 |
large language model multimodal |
✅ |
|
| 6 |
A Benchmark for Crime Surveillance Video Analysis with Large Models |
提出UCVL:一个用于犯罪监控视频分析的大模型评测基准 |
large language model multimodal |
|
|
| 7 |
A Solver-Aided Hierarchical Language for LLM-Driven CAD Design |
提出AIDL:一种求解器辅助的分层语言,用于LLM驱动的CAD设计 |
large language model |
|
|
| 8 |
Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model |
提出MCDM模型,利用运动先验条件扩散生成长期连贯的TalkingFace视频 |
multimodal |
|
|
| 9 |
From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs |
提出VDEP,通过自回归预训练增强MLLM图像和文本token的对齐,提升多模态理解能力。 |
multimodal |
|
|
| 10 |
Evolution of Data-driven Single- and Multi-Hazard Susceptibility Mapping and Emergence of Deep Learning Methods |
综述性论文:探讨数据驱动的单灾害与多灾害易感性制图演进及深度学习方法的兴起 |
multimodal |
|
|
| 11 |
EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition |
提出EventSTR数据集与SimC-ESTR框架,用于事件流数据驱动的场景文本识别。 |
large language model |
✅ |
|
| 12 |
DiffoRA: Enabling Parameter-Efficient Fine-Tuning via Differential Module Selection |
DiffoRA:通过差异化模块选择实现参数高效的微调 |
large language model |
|
|