TomatoScanner: phenotyping tomato fruit based on only RGB image

📄 arXiv: 2503.05568v1 📥 PDF

作者: Xiaobei Zhao, Xiangrong Zeng, Yihang Ma, Pengjin Tang, Xiang Li

分类: cs.CV

发布日期: 2025-03-07

备注: 12 pages, 37 figures. Codes and datasets are open-sourced in https://github.com/AlexTraveling/TomatoScanner

🔗 代码/项目: GITHUB


💡 一句话要点

提出TomatoScanner以解决番茄果实表型测量问题

🎯 匹配领域: 支柱三:空间感知与语义 (Perception & Semantics)

关键词: 番茄表型测量 计算机视觉 实例分割 深度估计 农业自动化

📋 核心要点

  1. 现有的表型测量方法依赖人工,效率低且可能危害人类健康,且2D方法需要额外校准或会对果实造成损害。
  2. 本文提出的TomatoScanner方法仅需RGB图像,通过EdgeYOLO进行实例分割提取像素特征,并结合深度估计进行特征融合。
  3. 实验结果显示,TomatoScanner在多个表型特征测量中表现优异,且EdgeYOLO的分割精度显著提升,保持轻量高效。

📝 摘要(中文)

在番茄温室中,表型测量对研究人员和农民监测作物生长具有重要意义,从而能够及时精确控制环境条件,提高作物质量和产量。传统的表型测量主要依赖人工,虽然准确但效率低下,并且对人类健康和安全构成威胁。本文提出了一种非接触式的番茄果实表型测量方法TomatoScanner,仅需RGB图像作为输入。通过实例分割的EdgeYOLO提取像素特征,结合Depth Pro进行深度估计,最终融合像素和深度特征输出表型结果。我们建立了自建的番茄表型数据集进行测试,TomatoScanner在宽度、高度、垂直面积和体积的测量中表现优异,分别达到5.63%、7.03%、-0.64%和37.06%的中位相对误差。我们还在EdgeYOLO中提出并添加了EdgeAttention、EdgeLoss和EdgeBoost三个创新模块,显著提高了边缘部分的分割精度。

🔬 方法详解

问题定义:本文旨在解决传统番茄果实表型测量方法的低效率和潜在健康风险问题。现有的2D和3D方法存在校准复杂、果实损坏和设备昂贵等痛点。

核心思路:提出TomatoScanner,利用RGB图像进行非接触式表型测量,避免了传统方法的缺陷。通过实例分割和深度估计相结合,提取并融合特征,输出准确的表型结果。

技术框架:整体架构包括三个主要模块:首先,使用EdgeYOLO进行实例分割提取像素特征;其次,利用Depth Pro进行深度估计;最后,融合像素和深度特征,输出最终的表型数据。

关键创新:本文的创新点在于提出了EdgeAttention、EdgeLoss和EdgeBoost三个模块,专注于提高边缘部分的分割精度,与现有方法相比,显著提升了分割效果。

关键设计:EdgeYOLO的设计保持轻量化,权重大小为48.7M,处理速度为76.34 FPS。通过优化损失函数和网络结构,Precision和Mean Edge Error分别从0.943和5.641%提升至0.986和2.963%。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,TomatoScanner在宽度、高度、垂直面积和体积的测量中,分别实现了5.63%、7.03%、-0.64%和37.06%的中位相对误差。EdgeYOLO的分割精度显著提升,Precision从0.943提高到0.986,Mean Edge Error从5.641%降低到2.963%。

🎯 应用场景

该研究的潜在应用领域包括农业自动化、智能温室管理和精准农业。通过高效的番茄果实表型测量,农民可以实时监测作物生长状况,优化管理策略,提高作物产量和质量,具有重要的实际价值和未来影响。

📄 摘要(原文)

In tomato greenhouse, phenotypic measurement is meaningful for researchers and farmers to monitor crop growth, thereby precisely control environmental conditions in time, leading to better quality and higher yield. Traditional phenotyping mainly relies on manual measurement, which is accurate but inefficient, more importantly, endangering the health and safety of people. Several studies have explored computer vision-based methods to replace manual phenotyping. However, the 2D-based need extra calibration, or cause destruction to fruit, or can only measure limited and meaningless traits. The 3D-based need extra depth camera, which is expensive and unacceptable for most farmers. In this paper, we propose a non-contact tomato fruit phenotyping method, titled TomatoScanner, where RGB image is all you need for input. First, pixel feature is extracted by instance segmentation of our proposed EdgeYOLO with preprocessing of individual separation and pose correction. Second, depth feature is extracted by depth estimation of Depth Pro. Third, pixel and depth feature are fused to output phenotype results in reality. We establish self-built Tomato Phenotype Dataset to test TomatoScanner, which achieves excellent phenotyping on width, height, vertical area and volume, with median relative error of 5.63%, 7.03%, -0.64% and 37.06%, respectively. We propose and add three innovative modules - EdgeAttention, EdgeLoss and EdgeBoost - into EdgeYOLO, to enhance the segmentation accuracy on edge portion. Precision and mean Edge Error greatly improve from 0.943 and 5.641% to 0.986 and 2.963%, respectively. Meanwhile, EdgeYOLO keeps lightweight and efficient, with 48.7 M weights size and 76.34 FPS. Codes and datasets: https://github.com/AlexTraveling/TomatoScanner.