Neural Fidelity Calibration for Informative Sim-to-Real Adaptation

📄 arXiv: 2504.08604v1 📥 PDF

作者: Youwei Yu, Lantao Liu

分类: cs.RO, cs.AI, cs.LG, eess.SY

发布日期: 2025-04-11


💡 一句话要点

提出神经保真度校准以解决仿真到现实的适应问题

🎯 匹配领域: 支柱一:机器人控制 (Robot Control) 支柱二:RL算法与架构 (RL & Architecture)

关键词: 神经保真度校准 深度强化学习 仿真到现实 机器人导航 条件扩散模型 策略微调 感知不确定性

📋 核心要点

  1. 现有的仿真到现实的转移方法依赖于专家知识,难以处理感知不确定性和仿真模型的偏差。
  2. 提出神经保真度校准(NFC)框架,通过条件扩散模型在线校准物理系数和残余保真度,增强策略的适应性。
  3. 实验结果表明,NFC在多种机器人上实现了优于现有方法的仿真校准精度,尤其在复杂现实条件下表现出色。

📝 摘要(中文)

深度强化学习能够无缝地将灵活的运动和导航技能从仿真环境转移到现实世界。然而,通过领域随机化或对抗方法来弥合仿真与现实之间的差距,往往需要专家的物理知识以确保策略的鲁棒性。此外,尖端仿真器可能无法捕捉现实世界的每一个细节,重建的环境可能因各种感知不确定性而引入误差。为了解决这些挑战,本文提出了一种新颖的框架——神经保真度校准(NFC),该框架利用条件基于分数的扩散模型在线校准仿真器的物理系数和残余保真度域。残余保真度反映了仿真模型相对于现实世界动态的偏移,并捕捉感知环境的不确定性,使我们能够在推断的分布下采样现实环境以进行策略微调。我们的框架在三个关键方面具有信息性和适应性:仅在异常场景下微调预训练策略、基于预训练NFC的提议先验构建顺序NFC在线、在NFC不确定性高时利用乐观探索实现策略优化。

🔬 方法详解

问题定义:本文旨在解决深度强化学习在仿真到现实转移中的保真度问题,现有方法往往需要专家知识,且难以应对感知不确定性和仿真模型的偏差。

核心思路:提出神经保真度校准(NFC)框架,利用条件基于分数的扩散模型在线校准仿真器的物理系数和残余保真度,以提高策略的鲁棒性和适应性。

技术框架:NFC框架包括三个主要模块:1) 在线校准模块,实时调整物理系数;2) 残余保真度评估模块,捕捉仿真与现实之间的动态偏差;3) 策略微调模块,在异常场景下进行策略优化。

关键创新:NFC的主要创新在于通过残余保真度的引入,使得策略微调能够在感知不确定性较高的情况下进行,从而提升了策略的适应性和鲁棒性。

关键设计:在模型设计上,NFC采用条件扩散模型,结合了预训练的策略和在线学习机制,减少了训练负担,并在高不确定性时引入乐观探索策略以优化决策。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,NFC框架在多种高维参数空间的机器人上实现了优于现有最先进方法的仿真校准精度,尤其在面对如破损轮轴等复杂现实条件时,表现出色,显著提升了机器人导航的鲁棒性。

🎯 应用场景

该研究具有广泛的应用潜力,尤其在机器人导航、自动驾驶和智能制造等领域。通过提高仿真与现实之间的适应性,NFC框架能够显著提升机器人在复杂环境中的操作能力,推动智能系统的实际应用和发展。

📄 摘要(原文)

Deep reinforcement learning can seamlessly transfer agile locomotion and navigation skills from the simulator to real world. However, bridging the sim-to-real gap with domain randomization or adversarial methods often demands expert physics knowledge to ensure policy robustness. Even so, cutting-edge simulators may fall short of capturing every real-world detail, and the reconstructed environment may introduce errors due to various perception uncertainties. To address these challenges, we propose Neural Fidelity Calibration (NFC), a novel framework that employs conditional score-based diffusion models to calibrate simulator physical coefficients and residual fidelity domains online during robot execution. Specifically, the residual fidelity reflects the simulation model shift relative to the real-world dynamics and captures the uncertainty of the perceived environment, enabling us to sample realistic environments under the inferred distribution for policy fine-tuning. Our framework is informative and adaptive in three key ways: (a) we fine-tune the pretrained policy only under anomalous scenarios, (b) we build sequential NFC online with the pretrained NFC's proposal prior, reducing the diffusion model's training burden, and (c) when NFC uncertainty is high and may degrade policy improvement, we leverage optimistic exploration to enable hallucinated policy optimization. Our framework achieves superior simulator calibration precision compared to state-of-the-art methods across diverse robots with high-dimensional parametric spaces. We study the critical contribution of residual fidelity to policy improvement in simulation and real-world experiments. Notably, our approach demonstrates robust robot navigation under challenging real-world conditions, such as a broken wheel axle on snowy surfaces.