DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning

作者: Jialang Lu, Huayu Zhao, Huiyu Zhai, Xingxing Yang, Shini Han

分类: cs.CV, cs.MM

发布日期: 2025-04-27

备注: Accepted by ICMR 2025 Main track. Code is available at https://github.com/Wenyuzhy/DeepSPG

🔗 代码/项目: GITHUB

💡 一句话要点

提出DeepSPG以解决低光照图像增强中的语义信息缺失问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 低光照图像增强 深度学习 多模态学习 语义信息 图像分解 计算机视觉 深度语义先验

📋 核心要点

现有低光照图像增强方法未能有效利用区域的语义信息，导致在极暗区域的增强效果不佳。
本文提出DeepSPG框架，通过结合图像级和文本级语义先验，利用多模态学习提升低光照图像的增强效果。
实验结果表明，DeepSPG在五个基准数据集上均优于现有最先进的方法，显示出显著的性能提升。

📝 摘要（中文）

长期以来，人们认为高层次语义学习可以促进各种计算机视觉任务。然而，在低光照图像增强（LLIE）领域，现有方法在低光照和正常光照域之间学习了粗暴的映射，而未考虑不同区域的语义信息，尤其是在信息严重丢失的极暗区域。为了解决这一问题，本文提出了一种基于Retinex图像分解的深度语义先验引导框架（DeepSPG），通过预训练的语义分割模型和多模态学习探索信息丰富的语义知识。我们结合了图像级和文本级的语义先验，构建了一个组合深度语义先验引导的多模态学习框架，最终在五个基准数据集上展示了优越的性能。

🔬 方法详解

问题定义：本文旨在解决低光照图像增强中信息丢失严重的问题，现有方法未能有效利用区域的语义信息，导致增强效果不理想。

核心思路：提出DeepSPG框架，通过引入图像级和文本级的语义先验，结合多模态学习，指导低光照图像的增强过程，以更好地恢复细节和信息。

技术框架：DeepSPG框架包括三个主要模块：图像级语义先验引导、文本级语义先验引导和多尺度语义感知结构，整体流程为先进行图像分解，再结合语义信息进行增强。

关键创新：最重要的创新在于结合了图像级和文本级的语义先验，通过多模态学习提升了低光照图像的增强效果，与传统方法相比，显著改善了信息恢复能力。

关键设计：在设计中，利用预训练的语义分割模型提取层次化语义特征，并通过预训练的视觉-语言模型整合自然语言语义约束，采用多尺度结构以有效融合语义特征。具体的损失函数和网络结构设计也经过精心调整，以优化增强效果。

🖼️ 关键图片

📊 实验亮点

DeepSPG在五个基准数据集上的实验结果显示，其在低光照图像增强任务中优于现有最先进的方法，具体性能提升幅度达到XX%（具体数据需根据实际结果填写），有效证明了其在语义信息利用方面的优势。

🎯 应用场景

该研究的潜在应用领域包括夜间摄影、监控视频增强和医学图像处理等。通过提升低光照图像的质量，DeepSPG可以为各类视觉任务提供更清晰的图像信息，具有重要的实际价值和广泛的应用前景。

📄 摘要（原文）

There has long been a belief that high-level semantics learning can benefit various downstream computer vision tasks. However, in the low-light image enhancement (LLIE) community, existing methods learn a brutal mapping between low-light and normal-light domains without considering the semantic information of different regions, especially in those extremely dark regions that suffer from severe information loss. To address this issue, we propose a new deep semantic prior-guided framework (DeepSPG) based on Retinex image decomposition for LLIE to explore informative semantic knowledge via a pre-trained semantic segmentation model and multimodal learning. Notably, we incorporate both image-level semantic prior and text-level semantic prior and thus formulate a multimodal learning framework with combinatorial deep semantic prior guidance for LLIE. Specifically, we incorporate semantic knowledge to guide the enhancement process via three designs: an image-level semantic prior guidance by leveraging hierarchical semantic features from a pre-trained semantic segmentation model; a text-level semantic prior guidance by integrating natural language semantic constraints via a pre-trained vision-language model; a multi-scale semantic-aware structure that facilitates effective semantic feature incorporation. Eventually, our proposed DeepSPG demonstrates superior performance compared to state-of-the-art methods across five benchmark datasets. The implementation details and code are publicly available at https://github.com/Wenyuzhy/DeepSPG.

DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理