Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

作者: Sigal Raab, Inbar Gat, Nathan Sala, Guy Tevet, Rotem Shalev-Arkushin, Ohad Fried, Amit H. Bermano, Daniel Cohen-Or

分类: cs.CV, cs.AI, cs.GR

发布日期: 2024-06-10

备注: Video: https://www.youtube.com/watch?v=s5oo3sKV0YU, Project page: https://monkeyseedocg.github.io, Code: https://github.com/MonkeySeeDoCG/MoMo-code

💡 一句话要点

MoMo：利用运动扩散模型中的自注意力实现零样本运动迁移

🎯 匹配领域: 支柱四：生成式动作 (Generative Motion) 支柱七：动作重定向 (Motion Retargeting) 支柱八：物理动画 (Physics-based Animation)

关键词: 运动迁移 扩散模型 自注意力机制 零样本学习 运动编辑

📋 核心要点

现有基于扩散模型的运动编辑方法未能充分利用预训练模型中蕴含的先验知识，导致难以进行细粒度的运动控制。
MoMo通过分析运动扩散模型中的注意力机制，提取并转移运动特征，实现零样本的运动风格迁移，同时保留个体特征。
MoMo结合运动反演技术，不仅可以编辑生成的运动，还可以编辑真实的运动，扩展了运动编辑的应用范围。

📝 摘要（中文）

鉴于运动合成在扩散模型中取得的显著成果，一个自然的问题是：如何有效地利用这些模型进行运动编辑？现有的基于扩散的运动编辑方法忽略了预训练模型权重中嵌入的先验知识的巨大潜力，这种先验知识能够操纵潜在特征空间；因此，它们主要集中在处理运动空间。本文探索了预训练运动扩散模型的注意力机制，揭示了注意力元素在捕获和表示复杂人体运动模式中的作用和相互作用，并仔细整合这些元素，将领导者的运动转移到跟随者身上，同时保持跟随者的细微特征，从而实现零样本运动迁移。编辑与选定运动相关的特征使我们能够应对先前运动扩散方法中观察到的挑战，这些方法使用通用指令（例如，文本、音乐）进行编辑，最终未能有效地传达细微的差别。我们的工作受到猴子密切模仿其所见，同时保持其独特运动模式的启发；因此，我们称之为Monkey See, Monkey Do，并将其命名为MoMo。采用我们的技术能够完成诸如合成超出分布的运动、风格迁移和空间编辑等任务。此外，扩散反演很少用于运动；因此，编辑工作集中在生成的运动上，限制了真实运动的可编辑性。MoMo利用运动反演，将其应用扩展到真实和生成的运动。实验结果表明了我们的方法优于当前技术。特别是，与通过训练为特定应用量身定制的方法不同，我们的方法在推理时应用，无需训练。

🔬 方法详解

问题定义：现有基于扩散模型的运动编辑方法主要依赖于通用指令（如文本、音乐）来引导运动生成，难以捕捉和传递运动中的细微差别和风格特征。此外，对真实运动的编辑能力也受到限制。

核心思路：MoMo的核心思路是利用预训练运动扩散模型中的自注意力机制，将领导者的运动特征迁移到跟随者身上，同时保留跟随者自身的运动风格。通过分析和操纵注意力权重，实现对运动特征的细粒度控制。

技术框架：MoMo主要包含以下几个阶段：1) 运动扩散模型预训练：使用大量运动数据训练一个运动扩散模型。2) 注意力分析：分析预训练模型中自注意力机制的作用，识别与运动风格和特征相关的注意力元素。3) 运动特征提取与迁移：从领导者的运动中提取关键的运动特征，并通过操纵注意力权重将其迁移到跟随者身上。4) 运动生成：利用修改后的注意力权重，生成具有领导者运动风格和跟随者自身特征的运动。

关键创新：MoMo的关键创新在于：1) 首次探索了运动扩散模型中自注意力机制在运动迁移中的作用。2) 提出了一种基于注意力权重的运动特征提取和迁移方法，实现了零样本的运动风格迁移。3) 结合运动反演技术，扩展了运动编辑的应用范围，使其能够编辑真实的运动。

关键设计：MoMo的关键设计包括：1) 使用Transformer架构作为运动扩散模型的基础。2) 设计了一种注意力权重操纵策略，用于控制运动特征的迁移强度。3) 使用运动反演技术将真实运动映射到潜在空间，以便进行编辑。

🖼️ 关键图片

📊 实验亮点

MoMo在零样本运动迁移任务上取得了显著的成果。实验结果表明，MoMo能够有效地将领导者的运动风格迁移到跟随者身上，同时保持跟随者自身的运动特征。与现有方法相比，MoMo无需针对特定任务进行训练，具有更强的泛化能力和灵活性。

🎯 应用场景

MoMo具有广泛的应用前景，包括：1) 虚拟角色的运动风格定制：可以根据用户的需求，将特定人物或动物的运动风格迁移到虚拟角色身上。2) 运动康复：可以帮助患者模仿正确的运动姿势，提高康复效果。3) 动画制作：可以快速生成具有特定风格的动画角色运动。4) 游戏开发：可以为游戏角色设计更加多样化的运动方式。

📄 摘要（原文）

Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models, which enables manipulating the latent feature space; hence, they primarily center on handling the motion space. In this work, we explore the attention mechanism of pre-trained motion diffusion models. We uncover the roles and interactions of attention elements in capturing and representing intricate human motion patterns, and carefully integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer. Editing features associated with selected motions allows us to confront a challenge observed in prior motion diffusion approaches, which use general directives (e.g., text, music) for editing, ultimately failing to convey subtle nuances effectively. Our work is inspired by how a monkey closely imitates what it sees while maintaining its unique motion patterns; hence we call it Monkey See, Monkey Do, and dub it MoMo. Employing our technique enables accomplishing tasks such as synthesizing out-of-distribution motions, style transfer, and spatial editing. Furthermore, diffusion inversion is seldom employed for motions; as a result, editing efforts focus on generated motions, limiting the editability of real ones. MoMo harnesses motion inversion, extending its application to both real and generated motions. Experimental results show the advantage of our approach over the current art. In particular, unlike methods tailored for specific applications through training, our approach is applied at inference time, requiring no training. Our webpage is at https://monkeyseedocg.github.io.

Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理