P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

作者: Yikang Yang, Zhanpeng Hu, Youtian Lin, Mengqi Zhou, Jingxi Xu, Feihu Zhang, Jiaheng Liu, Yao Yao

分类: cs.CV

发布日期: 2026-06-09

备注: Project page: https://lucasqaq.github.io/p3d/

💡 一句话要点

提出P3D-Bench以评估参数化3D生成与结构推理

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 参数化3D生成 多模态大型语言模型 结构推理 基准评估 几何保真度

📋 核心要点

现有基准很少通过代码评估3D建模，导致对参数化3D生成的评估不足。
P3D-Bench提供了一个统一的评估框架，涵盖文本、图像和装配到3D的任务，注重几何精确性和结构一致性。
实验结果表明，模型在恢复目标对象的整体形状和语义身份方面表现良好，但在精确参数化几何和部件级建模上仍存在显著不足。

📝 摘要（中文）

多模态大型语言模型能够编写代码以生成复杂程序，并利用这些程序进行3D建模，这为基于其先验知识、世界知识和推理能力的3D生成开辟了新途径。然而，现有基准很少通过代码评估3D建模。建模不仅需要可运行的代码，还要求从文本或视觉规范生成几何精确、语义一致且装配一致的参数化3D程序。我们引入了P3D-Bench，这是一个用于参数化3D生成的基准。与3D网格不同，参数化3D程序暴露了明确的维度、构造操作和部件关系，揭示模型是否恢复了设计的结构，而不仅仅是外观。在统一协议下，P3D-Bench涵盖了三类任务（文本到3D、图像到3D和装配到3D），并根据可执行性、几何保真度、拓扑、文本约束、多视图语义对齐和部件级结构对每个输出进行评分。我们在400个文本案例、400个图像案例和203个注释装配上评估了前沿的多模态大型语言模型和仅文本的LLM，结果显示装配是最具挑战性的设置，模型在组合多个部件成一致结构方面仍然失败。

🔬 方法详解

问题定义：本论文旨在解决现有基准在评估3D建模时的不足，特别是缺乏对参数化3D生成的全面评估。现有方法通常只关注可运行的代码，而忽视了几何精确性和结构一致性。

核心思路：P3D-Bench的核心思路是通过引入参数化3D程序的概念，强调模型在生成过程中不仅要关注外观，还要关注设计的结构和几何特征。

技术框架：P3D-Bench的整体架构包括三个主要任务模块：文本到3D、图像到3D和装配到3D。每个模块都在统一的评估协议下进行，确保输出的可执行性和几何保真度。

关键创新：P3D-Bench的最大创新在于其对参数化3D程序的定义，允许模型明确展示维度、构造操作和部件关系，这与传统的3D网格生成方法有本质区别。

关键设计：在评估过程中，P3D-Bench设置了多个关键指标，包括可执行性、几何保真度、拓扑结构和部件级结构等，以确保全面评估模型的生成能力。

🖼️ 关键图片

📊 实验亮点

实验结果显示，装配任务是最具挑战性的，模型在将多个部件组合成一致结构方面表现不佳。尽管模型能够恢复目标对象的整体形状和语义身份，但在精确参数化几何和部件级建模上仍存在显著不足。这些发现为未来的研究指明了方向。

🎯 应用场景

该研究的潜在应用领域包括计算机辅助设计、虚拟现实、游戏开发等，能够为3D建模提供更高效的工具和方法。未来，P3D-Bench可能成为评估和提升3D生成模型的重要标准，推动相关技术的发展。

📄 摘要（原文）

Multimodal large language models can write code to produce complex programs as well as use programs to do 3D modeling, which opens up a new avenue for 3D generation powered by their priors, world knowledge and reasoning. Yet existing benchmarks rarely evaluate 3D modeling through code. Such modeling demands more than runnable code: from a text or visual specification, a model must generate a parametric 3D program that is geometrically precise, semantically aligned and assembly-consistent. We introduce P3D-Bench, a benchmark for parametric 3D generation. Unlike a 3D mesh, a parametric 3D program exposes explicit dimensions, construction operations and part relations, revealing whether a model recovers a design's structure, not just its appearance. Under a unified protocol, P3D-Bench covers three task families (Text-to-3D, Image-to-3D and Assembly-3D) and scores each output for executability, geometric fidelity, topology, text-grounded constraints, multiview semantic alignment and part-level structure. We evaluate frontier MLLMs and text-only LLMs on 400 text cases, 400 image cases and 203 annotated assemblies, with domain-specific models as reference points. Our extensive evaluation yields three findings. First, assemblies are the hardest setting, where models still fail to compose multiple parts into a coherent structure. Second, models can often recover the global shape and semantic identity of the target object, yet fail to reproduce the precise parametric geometry specified by the input. Third, part-level modeling remains weak on assemblies, where models recover neither the geometry of each part nor the right number of parts. These results position P3D-Bench as a benchmark for evaluating precise parametric geometry and part-level structure in parametric 3D generation.

P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理