WorldGrow: Generating Infinite 3D World

作者: Sikuang Li, Chen Yang, Jiemin Fang, Taoran Yi, Jia Lu, Jiazhong Cen, Lingxi Xie, Wei Shen, Qi Tian

分类: cs.CV, cs.GR

发布日期: 2025-10-24

备注: Project page: https://world-grow.github.io/ Code: https://github.com/world-grow/WorldGrow

💡 一句话要点

提出WorldGrow以解决无限扩展3D世界生成问题

🎯 匹配领域: 支柱二：RL算法与架构 (RL & Architecture) 支柱三：空间感知与语义 (Perception & Semantics) 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 3D世界生成 场景合成 虚拟现实 几何重建 深度学习

📋 核心要点

现有方法在生成无限扩展的3D世界时面临几何和外观一致性不足、可扩展性差等挑战。
WorldGrow通过利用预训练3D模型的生成先验，提出了一种分层的场景合成框架，解决了上述问题。
在大规模3D-FRONT数据集上的评估显示，WorldGrow在几何重建上达到了SOTA性能，并支持无限场景生成。

📝 摘要（中文）

本文针对生成无限扩展的3D世界这一挑战，提出了WorldGrow框架。现有方法在几何和外观一致性、可扩展性以及场景生成能力上存在不足。WorldGrow通过利用预训练3D模型的生成先验，构建了一个分层框架，包含高质量场景块提取、上下文感知的3D块修复机制和粗到细的生成策略。实验结果表明，WorldGrow在几何重建上达到了最先进的性能，并支持生成具有照片级真实感和结构一致性的无限场景。

🔬 方法详解

问题定义：本文旨在解决生成无限扩展的3D世界的问题。现有方法在几何和外观一致性、可扩展性以及场景生成能力上存在明显不足，限制了其应用。

核心思路：WorldGrow的核心思路是利用预训练3D模型的强生成先验，进行结构化场景块的生成。这种设计使得生成的场景在几何和外观上更加一致。

技术框架：WorldGrow的整体架构包括三个主要模块：数据整理管道、3D块修复机制和粗到细的生成策略。数据整理管道提取高质量的场景块用于训练，3D块修复机制支持上下文感知的场景扩展，粗到细的生成策略确保全局布局的合理性和局部几何/纹理的真实感。

关键创新：WorldGrow的主要创新在于其分层框架和上下文感知的块修复机制，这与现有方法的对象中心生成方式有本质区别，能够实现更为连贯的场景生成。

关键设计：在关键设计上，WorldGrow采用了高质量场景块的提取技术，结合特定的损失函数和网络结构，以确保生成结果的几何和纹理一致性。

🖼️ 关键图片

📊 实验亮点

在大规模3D-FRONT数据集上的实验结果显示，WorldGrow在几何重建任务中达到了最先进的性能，具体表现为在多个评估指标上超越了现有基线，提升幅度显著，支持生成无限扩展的场景。

🎯 应用场景

该研究的潜在应用领域包括虚拟现实、游戏开发和城市规划等。通过生成大规模的虚拟环境，WorldGrow能够为用户提供沉浸式体验，并在未来的世界模型构建中发挥重要作用。

📄 摘要（原文）

We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.

WorldGrow: Generating Infinite 3D World

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理