JOIN: Anchor-Grasp-Conditioned Joining via Opposition, Inference, and Navigation for Bimanual Assistive Manipulation

📄 arXiv: 2606.11151v1 📥 PDF

作者: Drake Moore, Matt Cheng, Xiang Zhi Tan, Taşkın Padır

分类: cs.RO

发布日期: 2026-06-09

备注: Xiang Zhi Tan and Taşkın Padır share equal advising


💡 一句话要点

提出JOIN系统以解决双手辅助操作中的条件连接问题

🎯 匹配领域: 支柱一:机器人控制 (Robot Control)

关键词: 双手操作 辅助机器人 视觉语言模型 几何工具 任务规划 移动操控器 残障辅助

📋 核心要点

  1. 现有的单臂辅助系统在许多双手操作任务中表现不足,无法满足用户的实际需求。
  2. 本文提出了一种异构的双手系统,通过轮椅上的锚臂与移动操控器的结合来实现条件连接。
  3. 实验结果显示,JOIN系统在任务尝试次数上优于现有方法,且操作员的修正需求显著减少。

📝 摘要(中文)

辅助移动和操作平台越来越受到关注,旨在恢复残障人士的独立性。尽管现有系统在基本日常活动中有效,但许多双手操作任务仍然无法通过单臂系统完成。本文提出了一种异构的按需双手系统,通过召唤移动操控器作为补充臂来解决这一问题。我们将双手连接问题分解为三个阶段(规划、驱动、抓取),并展示了视觉语言模型与几何工具的结合能够有效解决代表性的双手日常活动。JOIN系统的贡献包括轮椅参考的对抗评分和任务条件的方向可操作性。实验表明,JOIN在代表性任务中表现优于现有方法。

🔬 方法详解

问题定义:本文解决的具体问题是如何在双手辅助操作中实现条件连接,现有方法在处理双手任务时存在效率低下和灵活性不足的痛点。

核心思路:论文的核心思路是通过将双手连接问题分解为规划、驱动和抓取三个阶段,利用视觉语言模型和几何工具的结合来实现任务级知识的有效应用。

技术框架:整体架构包括三个主要模块:首先是任务规划阶段,确定补充臂的目标位置;其次是驱动阶段,移动补充臂到指定位置;最后是抓取阶段,完成任务所需的抓取动作。

关键创新:最重要的技术创新点在于提出了轮椅参考的对抗评分和任务条件的方向可操作性,这些创新使得系统能够在复杂环境中更有效地执行双手任务。

关键设计:在设计中,采用了特定的参数设置和损失函数,以优化任务执行的效率和准确性,同时结合了标准几何工具以增强系统的灵活性。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果表明,JOIN系统在相同对象和不同对象任务中成功完成了19/20次尝试,而现有最先进的方法仅完成了14/20次,显示出显著的性能提升。此外,JOIN系统在操作员的修正需求上也显著减少,表明其在实际应用中的有效性。

🎯 应用场景

该研究的潜在应用场景包括残障人士的日常生活辅助、老年人护理以及各种需要双手协作的机器人任务。通过提高双手操作的灵活性和效率,JOIN系统能够显著提升用户的生活质量,并在未来的智能家居和服务机器人领域发挥重要作用。

📄 摘要(原文)

Assistive mobility and manipulation platforms have received increasing attention as a means of restoring independence to individuals with disabilities. While effective for many basic activities of daily living (ADLs), a significant percentage of everyday tasks such as opening a jar, pouring a liquid, lifting a tray, or basic meal preparation, is fundamentally bimanual and remains out of reach for any single-arm system. Adding a second arm to a wheelchair is impractical, due to the additional power draw, cost, and the loss of space required for transfers and mobility. We instead propose a heterogeneous, on-demand bimanual system, in which a wheelchair-mounted anchor arm is joined when needed by a summoned mobile manipulator that serves as a complement arm. The central technical problem, which we call bimanual joining, is conditional: the anchor has already committed to a grasp, and the complement arm must choose where to stand and what to grasp to complete the task. We formulate bimanual joining as a three-phase decomposition (plan, drive, grasp) and show that a vision-language model (VLM), coupled with standard geometric tools, provides task-level knowledge sufficient to solve a representative class of bimanual ADLs. Our system JOIN, contributes (i) a wheelchair-referenced opposition score, and (ii) task-conditioned directional manipulability. We evaluate JOIN on a Kinova Gen3 anchor and a Hello Robot Stretch~3 complement on representative same-object and different-object tasks. JOIN accomplished more attempts (19/20) than state-of-the-art methods (14/20) and required markedly less correction by the operator.