Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models

作者: Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, Jinhua Zhao

分类: cs.AI

发布日期: 2025-05-30

💡 一句话要点

提出一种融合人类专业知识的多模态扩散模型，用于城市设计的逐步生成式AI框架。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 生成式AI 城市设计 多模态扩散模型 人机协同 逐步生成

📋 核心要点

现有城市设计生成式AI方法通常采用端到端流程，缺乏对设计过程的精细控制和迭代优化能力。
论文提出一种逐步生成式框架，将城市设计流程分解为多个阶段，并在每个阶段融入人类专业知识进行指导和修正。
实验结果表明，该框架在生成设计的保真度、合规性和多样性方面均优于现有基线模型和端到端方法。

📝 摘要（中文）

城市设计是一个复杂的过程，需要仔细考虑场地特定约束以及不同专业人士和利益相关者之间的协作。生成式人工智能（GenAI）的出现通过提高设计生成效率和促进设计理念的沟通，提供了变革性的潜力。然而，大多数现有方法与人类设计工作流程的结合并不理想，通常采用控制有限的端到端流程，忽略了现实世界设计的迭代性质。本研究提出了一个逐步生成式城市设计框架，该框架将多模态扩散模型与人类专业知识相结合，以实现更具适应性和可控性的设计过程。该框架没有在单个端到端过程中生成设计结果，而是将过程划分为与既定城市设计工作流程相一致的三个关键阶段：（1）道路网络和土地利用规划，（2）建筑布局规划，（3）详细规划和渲染。在每个阶段，多模态扩散模型都会根据文本提示和基于图像的约束生成初步设计，然后由人类设计师进行审查和改进。我们设计了一个评估框架来评估生成设计的保真度、合规性和多样性。使用来自芝加哥和纽约市的数据进行的实验表明，我们的框架在所有三个维度上均优于基线模型和端到端方法。这项研究强调了多模态扩散模型和逐步生成在保持人类控制和促进迭代改进方面的优势，为城市设计解决方案中的人机交互奠定了基础。

🔬 方法详解

问题定义：现有城市设计生成式AI方法主要采用端到端的生成流程，缺乏对设计过程的精细控制，难以满足实际城市设计中迭代优化和人类专家干预的需求。这些方法往往忽略了城市设计的复杂性和多阶段性，无法充分利用人类设计师的专业知识和经验。

核心思路：论文的核心思路是将城市设计流程分解为多个关键阶段，并在每个阶段利用多模态扩散模型生成初步设计方案，然后由人类设计师进行审查、修改和完善。这种逐步生成的方式允许人类专家在设计的不同阶段进行干预，从而更好地控制设计方向和质量。

技术框架：该框架包含三个主要阶段：（1）道路网络和土地利用规划；（2）建筑布局规划；（3）详细规划和渲染。在每个阶段，框架首先接收文本提示和图像约束作为输入，然后利用多模态扩散模型生成初步设计方案。人类设计师对这些方案进行评估和修改，并将修改后的方案作为下一阶段的输入。整个过程是一个迭代优化的过程，直到最终生成满足设计要求的城市设计方案。

关键创新：该方法最重要的创新点在于将多模态扩散模型与人类专业知识相结合，实现了一种人机协同的城市设计流程。与传统的端到端生成方法相比，该方法允许人类专家在设计的不同阶段进行干预，从而更好地控制设计方向和质量。此外，该方法还利用了多模态扩散模型强大的生成能力，可以生成多样化的设计方案。

关键设计：论文中没有详细说明关键参数设置、损失函数和网络结构等技术细节。这些细节可能因不同的阶段和具体任务而有所不同。未来的研究可以进一步探索这些技术细节，以提高生成设计的质量和效率。

🖼️ 关键图片

📊 实验亮点

实验结果表明，该框架在芝加哥和纽约市的数据集上均优于基线模型和端到端方法。在保真度、合规性和多样性三个维度上，该框架均取得了显著的提升。这些结果验证了多模态扩散模型和逐步生成在城市设计中的有效性，并为未来的人机协同设计研究奠定了基础。

🎯 应用场景

该研究成果可应用于城市规划、建筑设计、景观设计等领域，为城市设计师提供高效、灵活的设计工具。通过人机协同的方式，可以加速设计流程，提高设计质量，并生成更多样化的设计方案。未来，该技术有望应用于智慧城市建设，为城市发展提供更科学、更可持续的解决方案。

📄 摘要（原文）

Urban design is a multifaceted process that demands careful consideration of site-specific constraints and collaboration among diverse professionals and stakeholders. The advent of generative artificial intelligence (GenAI) offers transformative potential by improving the efficiency of design generation and facilitating the communication of design ideas. However, most existing approaches are not well integrated with human design workflows. They often follow end-to-end pipelines with limited control, overlooking the iterative nature of real-world design. This study proposes a stepwise generative urban design framework that integrates multimodal diffusion models with human expertise to enable more adaptive and controllable design processes. Instead of generating design outcomes in a single end-to-end process, the framework divides the process into three key stages aligned with established urban design workflows: (1) road network and land use planning, (2) building layout planning, and (3) detailed planning and rendering. At each stage, multimodal diffusion models generate preliminary designs based on textual prompts and image-based constraints, which can then be reviewed and refined by human designers. We design an evaluation framework to assess the fidelity, compliance, and diversity of the generated designs. Experiments using data from Chicago and New York City demonstrate that our framework outperforms baseline models and end-to-end approaches across all three dimensions. This study underscores the benefits of multimodal diffusion models and stepwise generation in preserving human control and facilitating iterative refinements, laying the groundwork for human-AI interaction in urban design solutions.

Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理