DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments

📄 arXiv: 2606.06217v1 📥 PDF

作者: Tan Zhang, Quanyou Li, Lu Zhang, Jun Liu, Xiaofeng Zhu, Ping Hu

分类: cs.CV, cs.AI

发布日期: 2026-06-04

🔗 代码/项目: GITHUB


💡 一句话要点

提出DisasterBench以解决复杂环境下无人机灾害响应的多模态推理问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 无人机 灾害响应 多模态推理 边缘计算 强化学习 自然灾害 应急管理

📋 核心要点

  1. 现有多模态基准主要集中于感知任务,缺乏对复杂灾害场景的多阶段推理支持,限制了实际应用。
  2. 本文提出DisasterBench基准,涵盖多种灾害场景和响应任务,并引入DisasterVL模型以优化推理效率和准确性。
  3. 实验表明,DisasterVL在21个流行的MLLMs上表现优异,推理准确性与效率显著提升,接近最先进的闭源模型。

📝 摘要(中文)

在灾害发生时,响应者不仅需要了解当前情况,还需分析原因、预测后果并采取行动,尤其是在低空无人机视角下和计算资源有限的情况下。然而,现有的多模态基准主要集中于感知,覆盖的灾害类型有限,且对实际应急响应所需的多阶段推理支持不足。为此,本文提出了DisasterBench,这是一个针对复杂环境中无人机灾害响应的多阶段多模态推理基准,涵盖14种灾害场景和9个关键响应任务,明确测试因果归因、传播预测、损害分析和决策导向推理。为支持边缘推理,本文还提出了轻量级多模态模型DisasterVL,结合领域指令调优、思维链引导的多模态对齐和基于强化学习的策略优化,实验结果显示其在推理准确性和效率上均优于现有开源模型。

🔬 方法详解

问题定义:本文旨在解决现有多模态基准在灾害响应中的不足,尤其是在复杂环境下对多阶段推理的支持不足。现有方法多侧重于感知任务,缺乏对因果关系和决策过程的深入理解。

核心思路:论文提出DisasterBench基准,涵盖多种灾害场景和响应任务,强调多阶段推理能力。同时,设计了轻量级的DisasterVL模型,结合领域指令调优和强化学习,优化推理过程。

技术框架:DisasterBench由14种灾害场景和9个响应任务组成,分为灾前、灾中和灾后阶段。DisasterVL模型采用三阶段管道,包含领域指令调优、思维链引导的多模态对齐和强化学习策略优化。

关键创新:DisasterBench的提出填补了现有基准在多阶段推理方面的空白,DisasterVL模型在推理准确性和效率上超越了现有开源模型,接近最先进的闭源模型。

关键设计:DisasterVL模型参数为2B,采用了特定的损失函数和网络结构,确保在边缘计算环境下的高效推理。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,DisasterVL在21个流行的MLLMs上表现优异,推理准确性与效率显著提升,达到了与GPT-4o相当的推理准确性,同时在效率上有明显优势,展示了其在实际应用中的潜力。

🎯 应用场景

该研究的潜在应用领域包括自然灾害应急响应、城市安全监控和环境监测等。通过提供高效的多模态推理工具,能够帮助响应者在复杂环境中快速做出决策,提升灾害响应的效率和准确性,具有重要的实际价值和社会影响。

📄 摘要(原文)

When a disaster unfolds, responders must answer not only what is happening, but also why it is happening, what will happen next, and what to do now, often from noisy low-altitude UAV views and under tight on-site compute constraints. However, most existing multimodal benchmarks emphasize perception (e.g., recognition/description), cover limited disaster types, and provide insufficient support for the multi-stage reasoning required in practical emergency response. We introduce DisasterBench, a multi-stage multimodal reasoning benchmark for UAV-Based disaster response in complex environments. DisasterBench spans 14 disaster-related scene types and 9 response-critical tasks across pre-, during-, and post-disaster stages, with fine-grained disaster-task mappings that explicitly test causal attribution, propagation prediction, damage analysis, and decision-oriented reasoning. To enable reasoning on the edge, we further propose DisasterVL, a lightweight multimodal model optimized with a three-stage pipeline combining domain instruction tuning, chain-of-thought-guided multimodal alignment, and reinforcement learning-based policy optimization. Experiments across 21 popular MLLMs show that our 2B-parameter DisasterVL outperforms all evaluated open-source models and substantially narrows the gap to state-of-the-art closed-source models, achieving GPT-4o-comparable reasoning accuracy with superior efficiency. The project page is available at https://github.com/TanmouTT/DisasterBench.