GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

📄 arXiv: 2404.01810v2 📥 PDF

作者: Yaniv Wolf, Amit Bracha, Ron Kimmel

分类: cs.CV

发布日期: 2024-04-02 (更新: 2024-07-17)

备注: ECCV2024. Project Page: https://gs2mesh.github.io/


💡 一句话要点

提出GS2Mesh以解决高斯点云表面重建问题

🎯 匹配领域: 支柱三:空间感知与语义 (Perception & Semantics)

关键词: 3D重建 高斯点云 立体匹配 深度提取 计算机视觉 虚拟现实 增强现实

📋 核心要点

  1. 现有的3D高斯点云方法在几何提取上存在噪声和不真实的问题,难以实现平滑的表面重建。
  2. 本文通过引入预训练的立体匹配模型,利用立体图像对提取深度信息,从而改善几何重建的质量。
  3. 在真实场景和多个基准测试中,所提方法实现了更平滑、更准确的重建效果,展示了其优越性。

📝 摘要(中文)

近年来,3D高斯点云(3DGS)作为一种高效的场景表示方法逐渐受到关注。然而,从高斯属性中直接提取几何形状仍然面临挑战,现有方法往往产生噪声和不真实的表面。本文提出了一种新方法,通过引入真实世界知识来改善深度提取过程,利用预训练的立体匹配模型而非直接从高斯属性提取几何。通过渲染与原始训练姿态对应的立体图像对,获取深度轮廓并融合,最终生成平滑且细节丰富的3D网格。实验结果表明,该方法在真实场景和多个基准测试中表现优异,重建效果显著提升。

🔬 方法详解

问题定义:本文旨在解决从3D高斯点云中提取几何形状时产生的噪声和不真实表面的问题。现有方法主要依赖光度损失进行优化,导致几何重建效果不佳。

核心思路:论文提出通过引入预训练的立体匹配模型来提取深度信息,而不是直接从高斯属性中提取几何。这样可以利用真实世界的知识来改善重建质量。

技术框架:整体流程包括渲染与原始训练姿态对应的立体图像对,输入立体模型以获取深度轮廓,最后将所有深度轮廓融合生成单一的3D网格。

关键创新:最重要的创新在于通过立体匹配模型提取深度信息,从而有效地桥接了噪声的3D高斯表示与光滑的3D网格表示之间的差距。

关键设计:在参数设置上,采用了预训练的立体匹配网络,并设计了适合于深度轮廓融合的策略,确保最终生成的网格具有更高的平滑度和细节表现。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

在实验中,所提方法在真实场景中表现出色,重建效果明显优于现有方法,尤其在Tanks and Temples和DTU基准测试中,达到了最先进的结果,展示了其在复杂场景下的优越性。

🎯 应用场景

该研究具有广泛的应用潜力,尤其在计算机视觉、虚拟现实和增强现实等领域。通过提供更高质量的3D重建,能够提升用户体验,并为后续的场景理解和交互提供基础。未来,该方法还可能扩展到其他类型的视觉数据重建任务中。

📄 摘要(原文)

Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results.