Few-Shot Object Detection via Spatial-Channel State Space Model

📄 arXiv: 2507.15308v1 📥 PDF

作者: Zhimeng Xin, Tianxu Wu, Yixiong Zou, Shiming Chen, Dingjie Fu, Xinge You

分类: cs.CV

发布日期: 2025-07-21


💡 一句话要点

提出空间-通道状态空间模型以解决少样本目标检测问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱七:动作重定向 (Motion Retargeting)

关键词: 少样本目标检测 空间-通道建模 特征提取 通道相关性 深度学习

📋 核心要点

  1. 现有少样本目标检测方法在特征提取上存在不足,尤其是对通道有效性的判断不准确。
  2. 本文提出了一种新的空间-通道状态空间建模(SCSM)模块,通过建模通道间的相关性来优化特征提取过程。
  3. 在VOC和COCO数据集上的实验表明,SCSM模块显著提升了检测性能,达到了最先进的水平。

📝 摘要(中文)

由于少样本目标检测(FSOD)中的训练样本有限,当前方法在有效特征提取方面存在困难。具体表现为高权重通道不一定有效,而低权重通道可能仍具重要价值。为解决这一问题,本文利用通道间的相关性来促进模型适应新条件,确保模型能够正确突出有效通道并纠正无效通道。基于此,提出了空间-通道状态空间建模(SCSM)模块,强调有效模式并纠正无效模式。实验结果表明,SCSM模块显著提高了通道特征表示的质量,并在VOC和COCO数据集上实现了最先进的性能。

🔬 方法详解

问题定义:本文旨在解决少样本目标检测中,现有方法在通道特征提取上的不足,尤其是高权重通道和低权重通道的有效性判断问题。

核心思路:通过利用通道间的相关性,设计空间-通道状态空间建模(SCSM)模块,以便在特征提取过程中突出有效通道并纠正无效通道。

技术框架:整体架构包括空间特征建模(SFM)模块和基于Mamba的通道状态建模(CSM)模块,前者平衡空间关系和通道关系的学习,后者专注于通道间的相关性建模。

关键创新:SCSM模块的设计是本文的核心创新,利用通道间的相关性来优化特征提取,与现有方法相比,能够更准确地识别有效通道。

关键设计:在SFM模块中,设计了特定的损失函数以平衡空间和通道特征的学习;CSM模块则采用Mamba模型来捕捉通道间的动态相关性,确保模型在新条件下的适应性。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,SCSM模块在VOC和COCO数据集上实现了显著的性能提升,相较于基线方法,检测精度提高了X%,展示了其在少样本目标检测中的有效性和优越性。

🎯 应用场景

该研究的潜在应用领域包括智能监控、自动驾驶、机器人视觉等,能够在样本稀缺的情况下提升目标检测的准确性和鲁棒性。未来,随着数据集的扩展和模型的优化,该方法有望在更多实际场景中得到应用,推动相关领域的发展。

📄 摘要(原文)

Due to the limited training samples in few-shot object detection (FSOD), we observe that current methods may struggle to accurately extract effective features from each channel. Specifically, this issue manifests in two aspects: i) channels with high weights may not necessarily be effective, and ii) channels with low weights may still hold significant value. To handle this problem, we consider utilizing the inter-channel correlation to facilitate the novel model's adaptation process to novel conditions, ensuring the model can correctly highlight effective channels and rectify those incorrect ones. Since the channel sequence is also 1-dimensional, its similarity with the temporal sequence inspires us to take Mamba for modeling the correlation in the channel sequence. Based on this concept, we propose a Spatial-Channel State Space Modeling (SCSM) module for spatial-channel state modeling, which highlights the effective patterns and rectifies those ineffective ones in feature channels. In SCSM, we design the Spatial Feature Modeling (SFM) module to balance the learning of spatial relationships and channel relationships, and then introduce the Channel State Modeling (CSM) module based on Mamba to learn correlation in channels. Extensive experiments on the VOC and COCO datasets show that the SCSM module enables the novel detector to improve the quality of focused feature representation in channels and achieve state-of-the-art performance.