ClothHMR: 3D Mesh Recovery of Humans in Diverse Clothing from Single Image

📄 arXiv: 2512.17545v1 📥 PDF

作者: Yunqi Gao, Leyuan Liu, Yuhan Li, Changxin Gao, Yuanyuan Liu, Jingying Chen

分类: cs.CV, cs.AI

发布日期: 2025-12-19

备注: 15 pages,16 figures

DOI: 10.1145/3731715.3733288

🔗 代码/项目: GITHUB


💡 一句话要点

提出ClothHMR以解决多样服装下3D人类网格恢复问题

🎯 匹配领域: 支柱六:视频提取与匹配 (Video Extraction)

关键词: 3D网格恢复 多样服装 人体建模 计算机视觉 深度学习

📋 核心要点

  1. 现有的3D人类网格恢复方法主要针对紧身衣物,面对多样服装时准确性不足,尤其是宽松衣物的形状和姿态估计困难。
  2. 本文提出ClothHMR,通过服装调整和FHVM基础模型相结合,提升3D网格恢复的准确性,确保服装与人体轮廓的匹配。
  3. 实验结果显示,ClothHMR在多个基准数据集上表现优异,显著超越现有方法,并且在实际应用中也展现出良好的效果。

📝 摘要(中文)

随着3D数据作为重要多媒体信息形式的快速发展,3D人类网格恢复技术也随之进步。然而,现有方法主要集中于紧身衣物,面对多样服装,尤其是宽松衣物时表现不佳。为此,本文提出ClothHMR,基于两个关键见解:一是通过调整服装以适应人体,可以减轻服装对3D网格恢复的负面影响;二是利用大型基础模型中的人类视觉信息,可以增强估计的泛化能力。ClothHMR由服装调整模块和基于FHVM的网格恢复模块组成,能够准确恢复穿着多样服装的人类3D网格。实验结果表明,ClothHMR在基准数据集和实际图像中显著优于现有最先进的方法。

🔬 方法详解

问题定义:本文旨在解决在多样服装下进行3D人类网格恢复的挑战,现有方法在宽松衣物的形状和姿态估计上表现不佳,导致恢复结果不准确。

核心思路:论文提出的核心思路是通过调整服装以适应人体轮廓,减轻服装对网格恢复的影响,并结合大型基础模型的视觉信息来增强估计的泛化能力。

技术框架:ClothHMR的整体架构包括两个主要模块:服装调整模块(CT)和基于FHVM的网格恢复模块(MR)。CT模块通过身体语义估计和边缘预测来调整服装,MR模块则通过不断对齐3D网格的中间表示与FHVM推断的结果来优化初始参数。

关键创新:ClothHMR的关键创新在于结合了服装调整与FHVM的视觉信息,显著提升了在多样服装下的3D网格恢复能力,这一方法与传统方法在处理服装适应性方面有本质区别。

关键设计:在CT模块中,采用了身体语义估计和边缘预测技术,以确保服装与人体轮廓的匹配;在MR模块中,通过对齐中间表示来优化网格参数,确保恢复的准确性。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果表明,ClothHMR在多个基准数据集上显著优于现有最先进的方法,具体提升幅度达到XX%(具体数据待补充),在实际图像中的表现也得到了验证,展示了其强大的实用性和准确性。

🎯 应用场景

ClothHMR的研究成果在多个领域具有潜在应用价值,包括虚拟试衣、在线购物、游戏角色建模等。通过准确恢复穿着多样服装的人类3D网格,能够提升用户体验,推动时尚行业的数字化转型。未来,该技术还可能扩展到增强现实和虚拟现实等新兴领域,进一步提升交互性和沉浸感。

📄 摘要(原文)

With 3D data rapidly emerging as an important form of multimedia information, 3D human mesh recovery technology has also advanced accordingly. However, current methods mainly focus on handling humans wearing tight clothing and perform poorly when estimating body shapes and poses under diverse clothing, especially loose garments. To this end, we make two key insights: (1) tailoring clothing to fit the human body can mitigate the adverse impact of clothing on 3D human mesh recovery, and (2) utilizing human visual information from large foundational models can enhance the generalization ability of the estimation. Based on these insights, we propose ClothHMR, to accurately recover 3D meshes of humans in diverse clothing. ClothHMR primarily consists of two modules: clothing tailoring (CT) and FHVM-based mesh recovering (MR). The CT module employs body semantic estimation and body edge prediction to tailor the clothing, ensuring it fits the body silhouette. The MR module optimizes the initial parameters of the 3D human mesh by continuously aligning the intermediate representations of the 3D mesh with those inferred from the foundational human visual model (FHVM). ClothHMR can accurately recover 3D meshes of humans wearing diverse clothing, precisely estimating their body shapes and poses. Experimental results demonstrate that ClothHMR significantly outperforms existing state-of-the-art methods across benchmark datasets and in-the-wild images. Additionally, a web application for online fashion and shopping powered by ClothHMR is developed, illustrating that ClothHMR can effectively serve real-world usage scenarios. The code and model for ClothHMR are available at: \url{https://github.com/starVisionTeam/ClothHMR}.