Autonomous Obstacle Removal for Excavators through Policy Learning with Particle Simulation

📄 arXiv: 2606.09183v1 📥 PDF

作者: Yuki Kadokawa, Sandro M. Alcantara Tacora, Taro Abe, Daisuke Endo, Genki Yamauchi, Takeshi Hashimoto, Takamitsu Matsubara

分类: cs.RO

发布日期: 2026-06-08

备注: under review


💡 一句话要点

提出基于粒子模拟的自主障碍物移除策略以解决挖掘机自动化问题

🎯 匹配领域: 支柱一:机器人控制 (Robot Control) 支柱二:RL算法与架构 (RL & Architecture)

关键词: 自主障碍物移除 粒子模拟 策略学习 挖掘机 课程学习 计算机视觉 机器人技术

📋 核心要点

  1. 现有方法在挖掘机的自主障碍物移除任务中面临高计算成本和难以适应变化的土壤-障碍物条件的挑战。
  2. 论文提出了一种基于粒子模拟的时间高效策略学习框架,结合埋藏条件的课程学习策略,以优化学习过程。
  3. 实验结果显示,该框架在三天内成功学习到有效策略,而基线方法在一周训练后仍未能达到预期效果。

📝 摘要(中文)

自主从地面移除障碍物是重要的土方作业任务,但由于挖掘机必须根据土壤和障碍物条件的变化调整挖掘轨迹,因此这一过程难以自动化。学习这种状态依赖行为需要一个能够重现土壤与障碍物交互的训练环境。本文提出了一种基于粒子模拟的高效策略学习框架,采用埋藏条件的课程学习策略,逐步增加埋藏深度以控制任务难度和模拟成本。实验表明,该框架在三天内成功学习到有效的障碍物移除策略,并成功转移到实际挖掘机上,展示了其强大的障碍物移除能力。

🔬 方法详解

问题定义:本文旨在解决挖掘机在自主障碍物移除任务中的适应性与效率问题。现有方法在应对变化的土壤和障碍物条件时,面临高计算成本和学习效率低下的痛点。

核心思路:论文提出了一种基于粒子模拟的策略学习框架,采用埋藏条件的课程学习策略,逐步增加任务难度和粒子数量,以提高学习效率和效果。

技术框架:整体架构包括数据采集模块(通过RGB-D传感器获取地形和障碍物信息)、粒子模拟模块(进行环境模拟)、策略学习模块(生成挖掘轨迹),以及实地测试模块(在真实挖掘机上验证策略)。

关键创新:最重要的技术创新在于提出了埋藏条件的课程学习策略,通过控制埋藏深度和粒子数量,优化了学习过程并降低了计算成本。这与现有方法的静态学习环境形成了鲜明对比。

关键设计:在策略学习过程中,采用了参数化的挖掘轨迹输出,损失函数设计考虑了挖掘效率和障碍物移除成功率,网络结构则基于深度学习模型,能够有效处理RGB-D输入数据。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果表明,所提出的框架在三天内成功学习到有效的障碍物移除策略,而基线方法在一周的训练后仍未能达到预期效果。该框架在真实12吨挖掘机上成功转移,能够在开放地面上处理各种钢制障碍物,展示了其强大的实际应用能力。

🎯 应用场景

该研究的潜在应用领域包括建筑施工、矿山开采和城市基础设施维护等场景,能够显著提高挖掘机在复杂环境中的自主作业能力,降低人力成本和安全风险。未来,该技术有望推广至其他类型的自动化设备,推动智能施工的发展。

📄 摘要(原文)

Autonomous obstacle removal from the ground is an important earthwork task, but this is difficult to automate because an excavator must adapt its excavation trajectories over repeated cycles as soil-obstacle conditions change. Learning such state-dependent behavior requires a training environment that reproduces accumulated soil-obstacle interactions, including contact states, terrain deformation, and obstacle visibility. Accordingly, particle-based simulation is suitable for the relevant policy learning. However, particle simulation is computationally expensive, and repeated excavation cycles further increase the learning cost. We observe that the burial condition of an obstacle governs both task difficulty and simulation cost: deeper burial makes obstacle removal harder while also requiring more particles for accurate simulation. This observation motivates a burial-conditioned curriculum learning strategy. We propose a time-efficient sim-to-real policy learning framework in which the policy observes terrain and obstacle information from RGB-D measurements and then outputs a parameterized excavation trajectory; in this process, the simulator reproduces in a real-world excavator the same observation-action interface it uses under controllable burial conditions. The curriculum begins with shallow burial conditions and progressively increases burial depth while adjusting particle count, thus simultaneously controlling task difficulty and simulation cost. Experiments show that the proposed framework successfully learns an effective obstacle-removal policy, whereas baseline methods fail even after a full week of training. The proposed curriculum achieves effective performance within three days and achieves successful transfer to a real 12-ton excavator operating on open ground with various steel obstacles, thus demonstrating robust obstacle removal.