Adapting Foundation Model for Dental Caries Detection with Dual-View Co-Training

作者: Tao Luo, Han Wu, Tong Yang, Dinggang Shen, Zhiming Cui

分类: cs.CV

发布日期: 2025-08-28

🔗 代码/项目: GITHUB

💡 一句话要点

提出DVCTNet，利用双视角协同训练提升牙齿龋齿检测精度

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 龋齿检测 双视角学习 协同训练 医学图像分析 深度学习 注意力机制 全景X光片

📋 核心要点

现有龋齿检测方法在处理细微对比变化和多样病变形态时精度不足，影响早期诊断。
DVCTNet利用全局全景X光片和局部牙齿图像的双视角信息，通过协同训练提升检测性能。
实验表明，DVCTNet在公共和自建数据集上均超越现有方法，具有临床应用潜力。

📝 摘要（中文）

本文提出了一种新颖的双视角协同训练网络DVCTNet，用于精确的牙齿龋齿检测，旨在解决现有方法因牙齿龋齿的细微对比变化和多样病变形态而导致的检测精度欠佳问题。DVCTNet首先采用自动牙齿检测来建立两个互补的视角：来自全景X光图像的全局视角和来自裁剪牙齿图像的局部视角。然后，在两个视角上分别预训练视觉基础模型。全局视角基础模型作为检测骨干网络，生成区域提议和全局特征，而局部视角模型则从区域提议匹配的相应裁剪牙齿图像块中提取详细特征。为了有效地整合来自两个视角的信息，引入了一个门控跨视角注意力（GCV-Atten）模块，该模块动态融合双视角特征，通过将融合特征集成回检测模型来增强检测流程，以进行最终的龋齿检测。在公共数据集上测试了DVCTNet，并在新策划的高精度牙齿龋齿检测数据集上进一步验证了其性能，该数据集使用口内图像和全景X射线进行双重验证。实验结果表明，DVCTNet在两个数据集上均优于现有的最先进方法，表明了该方法的临床适用性。

🔬 方法详解

问题定义：论文旨在解决全景X光片中牙齿龋齿检测精度不高的问题。现有方法难以有效应对龋齿的细微对比变化和多样病变形态，导致检测结果不理想，影响早期诊断和治疗。

核心思路：论文的核心思路是模拟牙医的诊断流程，即先整体观察全景X光片，再详细检查每个牙齿。因此，论文采用双视角协同训练，结合全景X光片的全局信息和裁剪牙齿图像的局部细节，从而提高龋齿检测的准确性。

技术框架：DVCTNet包含以下主要模块：1) 自动牙齿检测模块，用于定位全景X光片中的每个牙齿；2) 全局视角基础模型，用于提取全景X光片的全局特征并生成区域提议；3) 局部视角基础模型，用于提取裁剪牙齿图像的局部细节特征；4) 门控跨视角注意力（GCV-Atten）模块，用于动态融合全局和局部特征；5) 龋齿检测模块，基于融合特征进行最终的龋齿检测。

关键创新：论文的关键创新在于提出了双视角协同训练框架和门控跨视角注意力（GCV-Atten）模块。双视角协同训练框架能够有效利用全局和局部信息，提高检测精度。GCV-Atten模块能够动态融合双视角特征，自适应地调整全局和局部信息的权重，从而更好地适应不同的龋齿形态和位置。

关键设计：全局视角和局部视角分别使用不同的视觉基础模型进行预训练。GCV-Atten模块采用门控机制，根据输入特征动态调整全局和局部信息的权重。损失函数包括检测损失和跨视角一致性损失，用于约束两个视角的特征表示。

🖼️ 关键图片

📊 实验亮点

实验结果表明，DVCTNet在公共数据集和自建数据集上均取得了显著的性能提升。在自建数据集上，DVCTNet的平均精度均值（mAP）超过现有最先进方法，验证了该方法在实际临床应用中的有效性。代码和数据集已开源。

🎯 应用场景

该研究成果可应用于口腔医学领域，辅助牙医进行龋齿的早期诊断和治疗。通过提高龋齿检测的准确性和效率，可以减少漏诊和误诊，从而改善患者的口腔健康状况。此外，该方法还可以扩展到其他医学图像分析任务中，例如骨骼疾病检测和肿瘤检测。

📄 摘要（原文）

Accurate dental caries detection from panoramic X-rays plays a pivotal role in preventing lesion progression. However, current detection methods often yield suboptimal accuracy due to subtle contrast variations and diverse lesion morphology of dental caries. In this work, inspired by the clinical workflow where dentists systematically combine whole-image screening with detailed tooth-level inspection, we present DVCTNet, a novel Dual-View Co-Training network for accurate dental caries detection. Our DVCTNet starts with employing automated tooth detection to establish two complementary views: a global view from panoramic X-ray images and a local view from cropped tooth images. We then pretrain two vision foundation models separately on the two views. The global-view foundation model serves as the detection backbone, generating region proposals and global features, while the local-view model extracts detailed features from corresponding cropped tooth patches matched by the region proposals. To effectively integrate information from both views, we introduce a Gated Cross-View Attention (GCV-Atten) module that dynamically fuses dual-view features, enhancing the detection pipeline by integrating the fused features back into the detection model for final caries detection. To rigorously evaluate our DVCTNet, we test it on a public dataset and further validate its performance on a newly curated, high-precision dental caries detection dataset, annotated using both intra-oral images and panoramic X-rays for double verification. Experimental results demonstrate DVCTNet's superior performance against existing state-of-the-art (SOTA) methods on both datasets, indicating the clinical applicability of our method. Our code and labeled dataset are available at https://github.com/ShanghaiTech-IMPACT/DVCTNet.

Adapting Foundation Model for Dental Caries Detection with Dual-View Co-Training

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理