| 1 |
Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection |
提出POVFNDB基准,用于多模态大语言模型在视频假新闻检测中感知、理解和推理能力的细粒度评估。 |
large language model multimodal chain-of-thought |
|
|
| 2 |
FT-ARM: Fine-Tuned Agentic Reflection Multimodal Language Model for Pressure Ulcer Severity Classification with Reasoning |
FT-ARM:用于压力性溃疡分级的Agentic自反思多模态大语言模型 |
large language model multimodal |
|
|
| 3 |
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs |
Latent Sketchpad:利用草图视觉思维提升多模态大语言模型的推理能力 |
large language model multimodal |
✅ |
|
| 4 |
MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition |
提出基于多路径跨模态交互的混合网络MCIHN,用于提升多模态情感识别性能。 |
multimodal |
|
|
| 5 |
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation |
提出Ming-Flash-Omni,一种稀疏统一架构,用于多模态感知与生成。 |
multimodal |
|
|
| 6 |
Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks |
提出Mars-Bench火星科学基准,评估火星任务中Foundation模型的性能 |
foundation model |
✅ |
|
| 7 |
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs |
提出SCOPE,一种面向显著性和覆盖率的多模态大语言模型视觉Token剪枝方法。 |
large language model multimodal |
✅ |
|
| 8 |
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts |
提出AutoPrompt,利用LLM自动生成对抗性提示,实现对文本到图像模型的黑盒红队测试。 |
large language model zero-shot transfer |
|
|
| 9 |
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance |
ProMoE:通过显式路由指导,提升扩散Transformer在图像生成任务上的性能 |
large language model |
|
|
| 10 |
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents |
OSWorld-MCP:用于评估计算机使用Agent中MCP工具调用能力的新基准 |
multimodal |
|
|
| 11 |
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2 |
提出UAP-SAM2:一种针对SAM2的跨提示通用对抗攻击方法 |
foundation model |
|
|