One-Shot Affordance Grounding of Deformable Objects in Egocentric Organizing Scenes

作者: Wanjun Jia, Fan Yang, Mengfei Duan, Xianchi Chen, Yinxi Wang, Yiming Jiang, Wenrui Chen, Kailun Yang, Zhiyong Li

分类: cs.CV, cs.RO, eess.IV

发布日期: 2025-03-03 (更新: 2025-07-22)

备注: Accepted to IROS 2025. Source code and benchmark dataset will be publicly available at https://github.com/Dikay1/OS-AGDO

🔗 代码/项目: GITHUB

💡 一句话要点

提出OS-AGDO方法，解决机器人操作柔性物体时单样本可供性定位问题。

🎯 匹配领域: 支柱一：机器人控制 (Robot Control) 支柱三：空间感知与语义 (Perception & Semantics) 支柱六：视频提取与匹配 (Video Extraction)

关键词: 柔性物体操作 可供性定位 单样本学习 语义增强 关键点融合

📋 核心要点

现有方法在处理柔性物体操作时，面临组件属性不确定、视觉干扰和提示歧义等挑战。
OS-AGDO方法通过DefoSEM、OEKFM和实例条件提示，增强对柔性物体的理解和特征提取，缓解提示歧义。
在AGDDO15数据集上，OS-AGDO方法在KLD、SIM和NSS指标上分别提高了6.2%、3.2%和2.9%。

📝 摘要（中文）

本文提出了一种用于自中心组织场景中柔性物体单样本可供性定位(OS-AGDO)的新方法，旨在解决机器人操作柔性物体时，由于组件属性不确定、配置多样、视觉干扰和提示歧义等因素导致的感知和控制难题。该方法使机器人能够使用最少的样本识别以前未见过的、具有不同颜色和形状的柔性物体。具体而言，首先引入柔性物体语义增强模块(DefoSEM)，增强对内部结构的分层理解，提高在组件信息较弱的情况下准确识别局部特征的能力。其次，提出ORB增强的关键点融合模块(OEKFM)，利用几何约束优化关键组件的特征提取，提高对多样性和视觉干扰的适应性。此外，提出了一种基于图像数据和任务上下文的实例条件提示，有效缓解了提示词引起的区域歧义问题。在构建的真实世界数据集AGDDO15上进行了验证，实验结果表明，该方法显著优于现有方法，在KLD、SIM和NSS指标上分别提高了6.2%、3.2%和2.9%，并表现出较高的泛化性能。

🔬 方法详解

问题定义：论文旨在解决机器人操作柔性物体时，单样本条件下对物体可供性区域进行定位的问题。现有方法在处理柔性物体时，由于其形状多变、组件属性不确定，以及视觉干扰和提示歧义等因素，导致定位精度不高，泛化能力较差。

核心思路：论文的核心思路是通过增强对柔性物体内部结构的理解，优化关键组件的特征提取，并结合实例条件提示来缓解提示词引起的区域歧义。通过这种方式，模型可以更好地理解柔性物体的语义信息，从而更准确地定位可供性区域。

技术框架：OS-AGDO方法主要包含三个模块：Deformable Object Semantic Enhancement Module (DefoSEM)、ORB-Enhanced Keypoint Fusion Module (OEKFM)和Instance-Conditional Prompt。DefoSEM用于增强对柔性物体内部结构的分层理解；OEKFM利用几何约束优化关键组件的特征提取；Instance-Conditional Prompt基于图像数据和任务上下文，缓解提示词引起的区域歧义。整体流程是先通过DefoSEM和OEKFM提取图像特征，然后结合Instance-Conditional Prompt生成最终的可供性定位结果。

关键创新：论文的关键创新在于三个模块的协同作用。DefoSEM通过分层理解增强了对柔性物体语义信息的提取；OEKFM利用几何约束优化了关键点特征；Instance-Conditional Prompt则有效缓解了提示词的歧义性。与现有方法相比，OS-AGDO方法更注重对柔性物体内部结构的理解和关键组件的特征提取，从而提高了定位精度和泛化能力。

关键设计：DefoSEM模块采用分层卷积神经网络结构，通过不同尺度的卷积核提取不同层次的特征。OEKFM模块利用ORB算法提取关键点，并结合几何约束进行特征融合。Instance-Conditional Prompt模块则根据图像数据和任务上下文，动态生成提示词的嵌入表示，从而缓解提示词的歧义性。损失函数方面，论文采用KLD、SIM和NSS等指标来评估定位结果的准确性。

🖼️ 关键图片

📊 实验亮点

实验结果表明，OS-AGDO方法在AGDDO15数据集上显著优于现有方法，在KLD、SIM和NSS指标上分别提高了6.2%、3.2%和2.9%。这表明该方法在柔性物体可供性定位方面具有显著优势，并表现出良好的泛化性能。

🎯 应用场景

该研究成果可应用于机器人辅助的柔性物体操作，例如服装整理、医疗手术、食品加工等领域。通过提高机器人对柔性物体可供性区域的定位精度，可以显著提升操作效率和安全性，实现更智能、更灵活的自动化操作，具有重要的实际应用价值和广阔的未来发展前景。

📄 摘要（原文）

Deformable object manipulation in robotics presents significant challenges due to uncertainties in component properties, diverse configurations, visual interference, and ambiguous prompts. These factors complicate both perception and control tasks. To address these challenges, we propose a novel method for One-Shot Affordance Grounding of Deformable Objects (OS-AGDO) in egocentric organizing scenes, enabling robots to recognize previously unseen deformable objects with varying colors and shapes using minimal samples. Specifically, we first introduce the Deformable Object Semantic Enhancement Module (DefoSEM), which enhances hierarchical understanding of the internal structure and improves the ability to accurately identify local features, even under conditions of weak component information. Next, we propose the ORB-Enhanced Keypoint Fusion Module (OEKFM), which optimizes feature extraction of key components by leveraging geometric constraints and improves adaptability to diversity and visual interference. Additionally, we propose an instance-conditional prompt based on image data and task context, which effectively mitigates the issue of region ambiguity caused by prompt words. To validate these methods, we construct a diverse real-world dataset, AGDDO15, which includes 15 common types of deformable objects and their associated organizational actions. Experimental results demonstrate that our approach significantly outperforms state-of-the-art methods, achieving improvements of 6.2%, 3.2%, and 2.9% in KLD, SIM, and NSS metrics, respectively, while exhibiting high generalization performance. Source code and benchmark dataset are made publicly available at https://github.com/Dikay1/OS-AGDO.

One-Shot Affordance Grounding of Deformable Objects in Egocentric Organizing Scenes

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理