Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion

📄 arXiv: 2404.03070v1 📥 PDF

作者: Su Sun, Cheng Zhao, Yuliang Guo, Ruoyu Wang, Xinyu Huang, Yingjie Victor Chen, Liu Ren

分类: cs.CV

发布日期: 2024-04-03


💡 一句话要点

提出一种新方法以解决室内3D重建中的遮挡表面补全问题

🎯 匹配领域: 支柱三:空间感知与语义 (Perception & Semantics)

关键词: 3D重建 遮挡补全 深度学习 几何推断 室内场景 虚拟现实 增强现实

📋 核心要点

  1. 现有的3D重建方法主要集中在可见区域的重建,忽视了由于遮挡造成的不可见表面,导致重建结果不完整。
  2. 本文提出了一种基于深度测量的遮挡表面补全方法,通过学习完整场景的几何先验来推断未见场景的几何信息。
  3. 在3D-CRS和iTHOR数据集上的实验结果显示,所提方法在3D重建的完整性上分别提升了16.8%和24.2%。

📝 摘要(中文)

本文提出了一种新颖的室内3D重建方法,旨在通过深度读数完成遮挡表面的重建。现有的最先进方法仅关注场景中的可见区域,忽视了由于遮挡而导致的不可见区域,如家具之间的接触面、被遮挡的墙壁和地板。我们的方法通过学习各种完整场景的3D几何先验,推断未见场景的遮挡几何。我们设计了一个粗细层次的八叉树表示法,并结合双解码器架构,即Geo-decoder和3D Inpainter,共同重建完整的3D场景几何。实验结果表明,我们的方法在3D Completed Room Scene (3D-CRS)和iTHOR数据集上显著超越了现有方法,3D重建的完整性提升分别达到16.8%和24.2%。

🔬 方法详解

问题定义:本文旨在解决室内3D重建中由于遮挡导致的表面不可见问题。现有方法仅关注可见区域,无法有效重建被遮挡的表面,导致重建结果不完整。

核心思路:我们的方法通过学习多种完整场景的3D几何先验,利用深度测量推断未见场景的遮挡几何。设计了双解码器架构,分别处理可见和遮挡表面。

技术框架:整体架构包括粗细层次的八叉树表示法和两个主要模块:Geo-decoder和3D Inpainter。Geo-decoder专注于重建可见表面,3D Inpainter则用于补全遮挡表面。

关键创新:最重要的创新在于结合了在线优化和离线训练的双解码器架构,使得Geo-decoder能够针对每个场景进行优化,而3D Inpainter则具备跨场景的通用性。

关键设计:在网络结构上,Geo-decoder在细节层面进行在线优化,3D Inpainter在粗略层面进行离线训练。损失函数设计考虑了重建的完整性和几何一致性,以确保补全效果的准确性。

📊 实验亮点

实验结果表明,所提方法在3D-CRS和iTHOR数据集上的3D重建完整性分别提升了16.8%和24.2%,显著超越了现有最先进方法。这一成果验证了我们方法在处理遮挡表面补全方面的有效性和优越性。

🎯 应用场景

该研究在室内场景重建、虚拟现实、增强现实等领域具有广泛的应用潜力。通过提高3D重建的完整性,可以为室内导航、智能家居和游戏开发等提供更为真实的环境模型,进而提升用户体验和交互效果。未来,该方法还可以扩展到更复杂的场景和动态环境中。

📄 摘要(原文)

In this paper, we present a novel indoor 3D reconstruction method with occluded surface completion, given a sequence of depth readings. Prior state-of-the-art (SOTA) methods only focus on the reconstruction of the visible areas in a scene, neglecting the invisible areas due to the occlusions, e.g., the contact surface between furniture, occluded wall and floor. Our method tackles the task of completing the occluded scene surfaces, resulting in a complete 3D scene mesh. The core idea of our method is learning 3D geometry prior from various complete scenes to infer the occluded geometry of an unseen scene from solely depth measurements. We design a coarse-fine hierarchical octree representation coupled with a dual-decoder architecture, i.e., Geo-decoder and 3D Inpainter, which jointly reconstructs the complete 3D scene geometry. The Geo-decoder with detailed representation at fine levels is optimized online for each scene to reconstruct visible surfaces. The 3D Inpainter with abstract representation at coarse levels is trained offline using various scenes to complete occluded surfaces. As a result, while the Geo-decoder is specialized for an individual scene, the 3D Inpainter can be generally applied across different scenes. We evaluate the proposed method on the 3D Completed Room Scene (3D-CRS) and iTHOR datasets, significantly outperforming the SOTA methods by a gain of 16.8% and 24.2% in terms of the completeness of 3D reconstruction. 3D-CRS dataset including a complete 3D mesh of each scene is provided at project webpage.