DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments

作者: Tan Zhang, Quanyou Li, Lu Zhang, Jun Liu, Xiaofeng Zhu, Ping Hu

分类: cs.CV, cs.AI

发布日期: 2026-06-04

🔗 代码/项目: GITHUB

💡 一句话要点

提出DisasterBench以解决复杂环境下无人机灾害响应的多模态推理问题

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 无人机 灾害响应 多模态推理 边缘计算 强化学习 自然灾害 应急管理

📋 核心要点

现有多模态基准主要集中于感知任务，缺乏对复杂灾害场景的多阶段推理支持，限制了实际应用。
本文提出DisasterBench基准，涵盖多种灾害场景和响应任务，并引入DisasterVL模型以优化推理效率和准确性。
实验表明，DisasterVL在21个流行的MLLMs上表现优异，推理准确性与效率显著提升，接近最先进的闭源模型。

📝 摘要（中文）

在灾害发生时，响应者不仅需要了解当前情况，还需分析原因、预测后果并采取行动，尤其是在低空无人机视角下和计算资源有限的情况下。然而，现有的多模态基准主要集中于感知，覆盖的灾害类型有限，且对实际应急响应所需的多阶段推理支持不足。为此，本文提出了DisasterBench，这是一个针对复杂环境中无人机灾害响应的多阶段多模态推理基准，涵盖14种灾害场景和9个关键响应任务，明确测试因果归因、传播预测、损害分析和决策导向推理。为支持边缘推理，本文还提出了轻量级多模态模型DisasterVL，结合领域指令调优、思维链引导的多模态对齐和基于强化学习的策略优化，实验结果显示其在推理准确性和效率上均优于现有开源模型。

🔬 方法详解

问题定义：本文旨在解决现有多模态基准在灾害响应中的不足，尤其是在复杂环境下对多阶段推理的支持不足。现有方法多侧重于感知任务，缺乏对因果关系和决策过程的深入理解。

核心思路：论文提出DisasterBench基准，涵盖多种灾害场景和响应任务，强调多阶段推理能力。同时，设计了轻量级的DisasterVL模型，结合领域指令调优和强化学习，优化推理过程。

技术框架：DisasterBench由14种灾害场景和9个响应任务组成，分为灾前、灾中和灾后阶段。DisasterVL模型采用三阶段管道，包含领域指令调优、思维链引导的多模态对齐和强化学习策略优化。

关键创新：DisasterBench的提出填补了现有基准在多阶段推理方面的空白，DisasterVL模型在推理准确性和效率上超越了现有开源模型，接近最先进的闭源模型。

关键设计：DisasterVL模型参数为2B，采用了特定的损失函数和网络结构，确保在边缘计算环境下的高效推理。

🖼️ 关键图片

📊 实验亮点

实验结果显示，DisasterVL在21个流行的MLLMs上表现优异，推理准确性与效率显著提升，达到了与GPT-4o相当的推理准确性，同时在效率上有明显优势，展示了其在实际应用中的潜力。

🎯 应用场景

该研究的潜在应用领域包括自然灾害应急响应、城市安全监控和环境监测等。通过提供高效的多模态推理工具，能够帮助响应者在复杂环境中快速做出决策，提升灾害响应的效率和准确性，具有重要的实际价值和社会影响。

📄 摘要（原文）

When a disaster unfolds, responders must answer not only what is happening, but also why it is happening, what will happen next, and what to do now, often from noisy low-altitude UAV views and under tight on-site compute constraints. However, most existing multimodal benchmarks emphasize perception (e.g., recognition/description), cover limited disaster types, and provide insufficient support for the multi-stage reasoning required in practical emergency response. We introduce DisasterBench, a multi-stage multimodal reasoning benchmark for UAV-Based disaster response in complex environments. DisasterBench spans 14 disaster-related scene types and 9 response-critical tasks across pre-, during-, and post-disaster stages, with fine-grained disaster-task mappings that explicitly test causal attribution, propagation prediction, damage analysis, and decision-oriented reasoning. To enable reasoning on the edge, we further propose DisasterVL, a lightweight multimodal model optimized with a three-stage pipeline combining domain instruction tuning, chain-of-thought-guided multimodal alignment, and reinforcement learning-based policy optimization. Experiments across 21 popular MLLMs show that our 2B-parameter DisasterVL outperforms all evaluated open-source models and substantially narrows the gap to state-of-the-art closed-source models, achieving GPT-4o-comparable reasoning accuracy with superior efficiency. The project page is available at https://github.com/TanmouTT/DisasterBench.

DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理