| 1 |
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model |
提出Period-LLM,增强多模态大模型在周期性任务上的性能 |
large language model multimodal |
✅ |
|
| 2 |
Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts |
Mixpert:通过高效的视觉专家混合模型缓解多模态学习冲突 |
large language model multimodal |
|
|
| 3 |
DisTime: Distribution-based Time Representation for Video Large Language Models |
DisTime:面向视频大语言模型的基于分布的时间表示方法 |
large language model TAMP |
✅ |
|
| 4 |
Reasoning Can Hurt the Inductive Abilities of Large Language Models |
发现思维链推理可能损害大语言模型的归纳能力,并提出改进方法 |
large language model chain-of-thought |
|
|
| 5 |
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks |
Agent-X:用于评估视觉中心Agent多模态推理能力的大规模基准 |
multimodal |
✅ |
|
| 6 |
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals |
提出SustainFM基准框架,评估地理空间基础模型在可持续发展目标中的应用潜力。 |
foundation model |
|
|
| 7 |
Beyond Quantity: Distribution-Aware Labeling for Visual Grounding |
提出DAL框架,通过分布感知的伪标签方法提升视觉定位性能 |
visual grounding |
|
|
| 8 |
From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models |
统一理论框架揭示大模型幻觉与越狱攻击的内在联系 |
foundation model |
|
|
| 9 |
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT |
提出MVPBench:基于图结构评估多模态大模型在视觉物理常识推理中的多步推理能力 |
large language model multimodal chain-of-thought |
|
|
| 10 |
The Butterfly Effect in Pathology: Exploring Security in Pathology Foundation Models |
针对病理学Foundation模型的对抗攻击研究:揭示WSI分析的安全性风险 |
foundation model |
✅ |
|
| 11 |
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs |
提出CSVQA:一个用于评估VLM在STEM领域推理能力的中文多模态基准 |
multimodal |
✅ |
|
| 12 |
Federated Foundation Model for GI Endoscopy Images |
提出基于联邦学习的胃肠内窥镜图像基础模型,解决数据隐私下的模型训练难题。 |
foundation model |
|
|
| 13 |
SiLVR: A Simple Language-based Video Reasoning Framework |
提出SiLVR框架,利用语言模型增强视频理解推理能力,无需额外训练。 |
large language model multimodal |
|
|
| 14 |
SORCE: Small Object Retrieval in Complex Environments |
SORCE:提出复杂环境中基于文本的小目标检索新基准与多嵌入表示方法。 |
large language model multimodal |
|
|
| 15 |
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders |
提出Nar-KFC,利用叙事性关键帧提升MLLM长视频理解能力 |
large language model multimodal |
|
|
| 16 |
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation |
Geo-Sign:利用双曲对比正则化提升几何感知的手语翻译性能 |
large language model |
✅ |
|
| 17 |
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization |
ViStoryBench:用于故事可视化的综合性评测基准,涵盖多样叙事结构与风格。 |
large language model |
|
|
| 18 |
Conformal Prediction for Zero-Shot Models |
提出Conf-OT,提升零样本模型在领域漂移下的Conformal Prediction效率。 |
foundation model |
|
|