CAMF-Det: Closure-Aware Multimodal Fusion for LiDAR-Camera 3D Object Detection on UAV Platforms
作者: Yanze Jiang, Yanfeng Gu, Xian Li
分类: cs.CV
发布日期: 2026-06-08
💡 一句话要点
提出CAMF-Det以解决无人机平台下的遮挡问题
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 多模态融合 3D目标检测 无人机 遮挡建模 LiDAR 相机 深度学习
📋 核心要点
- 现有多模态融合方法未能有效处理无人机场景中的地面物体遮挡问题,导致性能下降。
- CAMF-Det通过物理建模推导双模态遮挡强度,并将其作为先验信息嵌入检测流程中。
- 在自建的SI3D-DI和SI3D-DII数据集上,CAMF-Det在各个难度级别上均取得了最佳性能,显著提升了检测精度。
📝 摘要(中文)
基于LiDAR和相机的多模态3D目标检测在地面车辆场景中表现优异,但在无人机平台上尚未得到充分探索。在无人机的俯视场景中,树冠主导的地面物体遮挡导致信息退化。现有的多模态融合框架未能显式建模遮挡问题,限制了其在无人机场景中的性能。为此,本文提出了CAMF-Det,一个闭合感知的多模态融合框架,通过物理启发建模推导双模态遮挡强度,并将其嵌入检测管道中。实验表明,CAMF-Det在自建的多模态数据集上表现优异,尤其在困难级别上相较于最佳竞争方法提升了9.43%和4.88%。
🔬 方法详解
问题定义:本文旨在解决无人机平台下LiDAR和相机的多模态3D目标检测中的地面物体遮挡问题。现有方法未能显式建模遮挡,导致在复杂场景中的性能不足。
核心思路:CAMF-Det通过物理启发的建模方法,推导出双模态的遮挡强度,并将其作为先验信息融入整个检测流程,以增强检测的鲁棒性。
技术框架:CAMF-Det的整体架构包括三个主要模块:双模态闭合建模模块、双模态预测网络和检测头。闭合建模模块离线构建遮挡强度的真实值,预测网络则在单帧推理中生成在线的遮挡强度预测。
关键创新:最重要的创新在于显式建模遮挡强度并将其融入检测流程,这与现有方法的隐式处理方式形成了显著区别,提升了在复杂环境下的检测能力。
关键设计:在模型设计中,采用了Beer-Lambert定律启发的公式和建筑物遮罩校正来构建遮挡强度的真实值,同时在数据增强、特征编码和多模态融合中注入遮挡强度信息,以适应空间变化和模态依赖的信息退化。
🖼️ 关键图片
📊 实验亮点
在自建的SI3D-DI和SI3D-DII数据集上,CAMF-Det在困难级别的mAP$_{ ext{BEV}}$上分别提升了9.43%和4.88%,超越了最佳竞争方法,验证了显式遮挡先验建模的有效性。
🎯 应用场景
该研究的潜在应用领域包括无人机监控、自动驾驶、环境监测等。通过提高在复杂遮挡场景下的检测精度,CAMF-Det能够为无人机在城市环境中的智能应用提供更可靠的支持,推动相关技术的发展与应用。
📄 摘要(原文)
Multimodal 3D object detection based on LiDAR and cameras has demonstrated excellent performance in ground-vehicle scenarios, but has not been explored for Unmanned Aerial Vehicle (UAV) platforms. In UAV top-down scenes, frequent groundobject occlusion dominated by tree canopies causes spatially varying and modality-dependent information degradation. Existing multimodal fusion frameworks neither explicitly model such ground-object occlusion nor embed occlusion awareness into the detection pipeline, limiting their performance in occluded UAV scenes. To address these challenges, we propose CAMF-Det, a closure-aware multimodal fusion framework for LiDAR-camera 3D object detection on UAV platforms, which derives dual-modal occlusion intensity through physics-inspired modeling and embeds them as priors throughout the detection pipeline. First, a dual-modal closure modeling module explicitly constructs occlusion intensity ground truth for both modalities offline via a Beer-Lambert-inspired formulation and building-mask correction. Second, using these ground-truth maps as supervision, a dual-modal prediction network converts the offline modeling results into online occlusion intensity predictions under single-frame inference. Third, both ground-truth and predicted occlusion intensity are injected into data augmentation, feature encoding, multimodal fusion, and detection head, enabling adaptive detection under spatially varying and modality-dependent information degradation. Experiments on two self-built UAV-based multimodal datasets, SI3D-DI and SI3D-DII, demonstrate that CAMF-Det achieves the best performance across all difficulty levels, with hard-level mAP$_{\mathrm{BEV}}$ improvements of 9.43% and 4.88% over the best competing methods, respectively. These results confirm the effectiveness of explicit occlusion prior modeling and exploitation for robust multimodal 3D detection in UAV scenes.