MotionCharacter: Fine-Grained Motion Controllable Human Video Generation

作者: Haopeng Fang, Di Qiu, Binjie Mao, He Tang

分类: cs.CV

发布日期: 2024-11-27 (更新: 2026-01-03)

备注: Accepted by AAAI 2026

💡 一句话要点

MotionCharacter：提出细粒度运动可控的人体视频生成框架，解决运动强度控制难题。

🎯 匹配领域: 支柱三：空间感知与语义 (Perception & Semantics)

关键词: 人体视频生成 运动控制 文本到视频 身份保持 光流 深度学习 细粒度控制

📋 核心要点

现有文本到视频生成方法难以对人物运动强度进行细粒度控制，限制了其在需要高精度控制场景中的应用。
MotionCharacter通过解耦动作类型和运动强度，并设计运动控制模块和ID内容插入模块来实现精确的运动控制和身份保持。
通过在Human-Motion数据集上进行大量实验，证明了MotionCharacter在运动控制精度和身份保持方面优于现有方法。

📝 摘要（中文）

近年来，个性化文本到视频（T2V）生成在合成特定人物内容方面取得了显著进展。然而，这些方法面临一个关键限制：无法对运动强度进行细粒度控制。这种限制源于粗略文本描述中动作语义及其对应幅度之间固有的纠缠，阻碍了细致入微的人体视频生成，并限制了其在高精度场景中的应用，例如动画虚拟化身或合成微妙的微表情。此外，现有方法在修改其他属性时，通常难以保持高身份保真度。为了解决这些挑战，我们提出了MotionCharacter，一个具有精确运动控制的高保真人体视频生成框架。MotionCharacter的核心是将运动显式地解耦为两个独立可控的组件：动作类型和运动强度。这通过两个关键技术贡献实现：（1）运动控制模块，利用文本短语指定动作类型，并利用从光流导出的可量化指标来调节其强度，由区域感知损失引导，将运动定位到相关主体区域；（2）ID内容插入模块，结合ID一致性损失，确保动态运动期间的鲁棒身份保持。为了促进这种细粒度控制的训练，我们还策划了Human-Motion，一个新的大规模数据集，具有运动和面部特征的详细注释。大量实验表明，MotionCharacter在现有方法上取得了显著改进。我们的框架擅长生成不仅身份一致，而且精确地遵守指定运动类型和强度的视频。

🔬 方法详解

问题定义：现有的人体视频生成方法，特别是文本到视频（T2V）的方法，无法对人物的运动强度进行细粒度控制。动作的语义和强度信息耦合在文本描述中，导致难以生成具有微妙运动变化的高质量视频，限制了其在虚拟化身动画和微表情合成等高精度场景中的应用。此外，在改变其他属性时，现有方法难以保持人物身份的一致性。

核心思路：MotionCharacter的核心思路是将人物运动解耦为两个独立可控的组成部分：动作类型和运动强度。通过分别控制这两个部分，可以实现对人物运动的细粒度控制，并生成更逼真、更符合要求的视频。同时，通过引入ID内容插入模块和ID一致性损失，保证在运动变化过程中人物身份的稳定。

技术框架：MotionCharacter框架主要包含两个核心模块：运动控制模块和ID内容插入模块。运动控制模块接收文本短语作为动作类型输入，并利用光流计算得到的运动强度指标来调节动作的幅度。区域感知损失用于引导模型将运动集中在人物的相关区域。ID内容插入模块负责将人物的身份信息融入到生成的视频中，并使用ID一致性损失来保证身份的稳定。整个框架通过端到端的方式进行训练。

关键创新：MotionCharacter的关键创新在于运动解耦和细粒度控制。通过将运动分解为动作类型和运动强度两个独立可控的组成部分，实现了对运动的精确控制。此外，区域感知损失和ID一致性损失的引入，进一步提升了运动控制的准确性和身份保持的稳定性。

关键设计：运动控制模块使用文本编码器提取动作类型的语义信息，并使用光流计算运动强度。区域感知损失通过关注人物的关键区域来引导运动的生成。ID内容插入模块使用预训练的人脸识别模型提取身份特征，并将其融入到生成的视频中。ID一致性损失使用人脸识别模型来衡量生成视频中人物身份的稳定性。Human-Motion数据集包含大量的视频数据，并对运动和面部特征进行了详细的标注，为模型的训练提供了充足的数据支持。

🖼️ 关键图片

📊 实验亮点

MotionCharacter在Human-Motion数据集上进行了大量实验，结果表明，该框架在运动控制精度和身份保持方面均优于现有方法。具体来说，MotionCharacter能够生成具有精确运动强度和一致身份的视频，并且在定量指标和视觉效果上都取得了显著的提升。实验结果验证了MotionCharacter在细粒度运动控制人体视频生成方面的有效性。

🎯 应用场景

MotionCharacter具有广泛的应用前景，包括虚拟化身动画、游戏角色控制、微表情合成、视频编辑和生成等领域。该技术可以用于创建更逼真、更具表现力的人物动画，并为用户提供更精细的控制能力。此外，该技术还可以应用于安全监控、医疗诊断等领域，例如通过分析面部微表情来识别潜在的威胁或疾病。

📄 摘要（原文）

Recent advancements in personalized Text-to-Video (T2V) generation have made significant strides in synthesizing character-specific content. However, these methods face a critical limitation: the inability to perform fine-grained control over motion intensity. This limitation stems from an inherent entanglement of action semantics and their corresponding magnitudes within coarse textual descriptions, hindering the generation of nuanced human videos and limiting their applicability in scenarios demanding high precision, such as animating virtual avatars or synthesizing subtle micro-expressions. Furthermore, existing approaches often struggle to preserve high identity fidelity when other attributes are modified. To address these challenges, we introduce MotionCharacter, a framework for high-fidelity human video generation with precise motion control. At its core, MotionCharacter explicitly decouples motion into two independently controllable components: action type and motion intensity. This is achieved through two key technical contributions: (1) a Motion Control Module that leverages textual phrases to specify the action type and a quantifiable metric derived from optical flow to modulate its intensity, guided by a region-aware loss that localizes motion to relevant subject areas; and (2) an ID Content Insertion Module coupled with an ID-Consistency loss to ensure robust identity preservation during dynamic motions. To facilitate training for such fine-grained control, we also curate Human-Motion, a new large-scale dataset with detailed annotations for both motion and facial features. Extensive experiments demonstrate that MotionCharacter achieves substantial improvements over existing methods. Our framework excels in generating videos that are not only identity-consistent but also precisely adhere to specified motion types and intensities.

MotionCharacter: Fine-Grained Motion Controllable Human Video Generation

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理