Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos

作者: Cuong Le, Viktor Johansson, Manon Kok, Bastian Wandt

分类: cs.CV

发布日期: 2024-10-10 (更新: 2025-05-14)

备注: 17 pages, 7 figure, NeurIPS 2024

🔗 代码/项目: GITHUB

💡 一句话要点

提出基于神经卡尔曼滤波的物理人体运动捕捉方法，提升运动平滑性和物理真实性。

🎯 匹配领域: 支柱四：生成式动作 (Generative Motion)

关键词: 人体运动捕捉 物理模拟 神经卡尔曼滤波 运动估计 循环神经网络

📋 核心要点

现有基于视频的人体运动捕捉方法易产生抖动等时间伪影，难以保证运动的物理合理性。
受神经卡尔曼滤波启发，提出一种在线方法，选择性地融合物理模型和运动学观测，实现最优状态动力学预测。
通过元PD控制器预测力矩和反作用力，结合循环神经网络实现的卡尔曼滤波器，平衡运动学输入和模拟运动。

📝 摘要（中文）

近年来，基于单目视频的人体运动捕捉取得了显著进展。然而，现有方法通常会产生时间伪影，例如抖动，并且难以实现平滑且符合物理规律的运动。显式地整合物理信息，如内力和外力矩，有助于缓解这些问题。目前最先进的方法使用自动PD控制器来预测力矩和反作用力，从而重新模拟输入的运动学信息，即预定义骨架的关节角度。但是，由于物理模型的不完善，这些方法通常需要简化假设和对输入运动学信息进行大量的预处理才能获得良好的性能。为此，我们提出了一种新的方法，该方法受到神经卡尔曼滤波方法的启发，可以在在线设置中有选择地将物理模型与运动学观测相结合。我们开发了一个控制回路作为元PD控制器，以预测内部关节力矩和外部反作用力，然后进行基于物理的运动模拟。引入循环神经网络来实现卡尔曼滤波器，该滤波器能够专注地平衡运动学输入和模拟运动，从而产生最佳状态动力学预测。我们表明，这种滤波步骤对于提供在线监督至关重要，有助于平衡各个输入运动的缺点，因此不仅对于捕获准确的全局运动轨迹很重要，而且对于产生符合物理规律的人体姿势也很重要。与现有技术相比，所提出的方法在基于物理的人体姿势估计任务中表现出色，并证明了预测动力学的物理合理性。代码可在https://github.com/cuongle1206/OSDCap上找到。

🔬 方法详解

问题定义：现有基于单目视频的人体运动捕捉方法，在保证运动平滑性和物理真实性方面存在不足。具体表现为运动轨迹抖动、不符合物理规律等问题。现有方法依赖于自动PD控制器和物理模型，但由于模型不完善，需要大量的预处理和简化假设，限制了其性能。

核心思路：论文的核心思路是借鉴神经卡尔曼滤波的思想，设计一个能够在线平衡运动学观测和物理模拟结果的框架。通过学习一个卡尔曼增益，动态调整对运动学数据和物理模型的信任程度，从而在保证运动轨迹准确性的同时，提高运动的物理合理性。

技术框架：整体框架包含以下几个主要模块：1) 元PD控制器：预测内部关节力矩和外部反作用力。2) 物理引擎：基于预测的力矩和反作用力进行运动模拟。3) 神经卡尔曼滤波器：使用循环神经网络（RNN）融合运动学观测和物理模拟结果，输出最优状态估计。整个过程形成一个控制循环，不断迭代优化运动状态。

关键创新：最重要的创新点在于引入了神经卡尔曼滤波器，它能够自适应地平衡运动学数据和物理模型的权重。与传统方法中手动调整或简化物理模型不同，该方法通过学习的方式，自动适应不同场景和运动的特点，从而提高了鲁棒性和泛化能力。

关键设计：元PD控制器的参数需要仔细调整，以保证力矩预测的准确性。循环神经网络（RNN）的设计至关重要，需要选择合适的网络结构和训练策略，以学习到有效的卡尔曼增益。损失函数的设计也需要考虑运动学误差和物理合理性，例如可以使用关节角度误差和力矩平滑性作为损失项。

🖼️ 关键图片

📊 实验亮点

论文提出的方法在基于物理的人体姿势估计任务中取得了显著的性能提升。通过与现有技术的对比实验，证明了该方法能够生成更平滑、更符合物理规律的运动轨迹。实验结果表明，该方法在保证运动学准确性的同时，显著提高了运动的物理合理性。

🎯 应用场景

该研究成果可应用于虚拟现实、游戏开发、动画制作等领域，提升虚拟角色的运动真实感和交互体验。此外，该方法还可用于运动分析、康复训练等领域，为专业人士提供更准确、更可靠的人体运动数据。

📄 摘要（原文）

Human motion capture from monocular videos has made significant progress in recent years. However, modern approaches often produce temporal artifacts, e.g. in form of jittery motion and struggle to achieve smooth and physically plausible motions. Explicitly integrating physics, in form of internal forces and exterior torques, helps alleviating these artifacts. Current state-of-the-art approaches make use of an automatic PD controller to predict torques and reaction forces in order to re-simulate the input kinematics, i.e. the joint angles of a predefined skeleton. However, due to imperfect physical models, these methods often require simplifying assumptions and extensive preprocessing of the input kinematics to achieve good performance. To this end, we propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting, inspired by a neural Kalman-filtering approach. We develop a control loop as a meta-PD controller to predict internal joint torques and external reaction forces, followed by a physics-based motion simulation. A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion, resulting in an optimal-state dynamics prediction. We show that this filtering step is crucial to provide an online supervision that helps balancing the shortcoming of the respective input motions, thus being important for not only capturing accurate global motion trajectories but also producing physically plausible human poses. The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics, compared to state of the art. The code is available on https://github.com/cuongle1206/OSDCap

Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理