基于改进U-Net与RGB-D图像的青花椒枝条“下桩”剪切点定位

蒲应俊; 张文州; 李金广; 赵立军; 陈子文; 杨明金

doi:10.11975/j.issn.1002-6819.202505128

基于改进U-Net与RGB-D图像的青花椒枝条“下桩”剪切点定位

Localization of pile picking points for green Sichuan pepper branches based on improved U-Net and RGB-D images

摘要

摘要: 青花椒枝条“下桩”是通过剪下带鲜果的枝条并保留一定长度短桩的采摘收获方法。为实现青花椒采摘机器人精准识别枝条并确定最佳剪切点以达到高效“下桩”作业，该研究提出了一种基于U-Net深度学习网络和RGB-D相机相结合的青花椒主枝“下桩”剪切点定位方法。首先，通过改进传统U-Net模型，将其主干网络替换为嵌入CA注意力机制的ResNet50网络，同时在U-Net模型的特征拼接阶段中增加SE注意力机制，从而构建针对青花椒主枝和树干的分割模型。然后，将分割后的图像利用二值化与骨架线提取方法得到主枝中心线，结合RGB-D相机的深度信息与OpenCV图像处理算法，完成世界坐标系与像素坐标系间长度的映射。随后，将短桩预设的40mm长度从世界坐标系映射至RGB图像中的像素长度，最终确定每根主枝的“下桩”剪切点位置。试验结果表明，改进后的U-Net模型在分割性能上优于DeeplabV3+和PSPNet，平均交并比(MIoU)、平均像素准确率(mPA)和召回率(recall)分别达到87.58%、93.76%和96.24%。在晴天顺光、逆光及阴天条件下，“下桩”剪切点识别定位的成功率分别达到90.81%、84.88%、80.52%。采摘点定位试验中，定位成功率为90%，单根花椒枝平均识别过程耗时1.93 s。该研究结果可为青花椒采摘机器人“下桩”采收提供技术支撑。

Abstract: The "Pile picking" method used for harvesting green Sichuan pepper branches refers to a targeted pruning technique in which the fruit-bearing branches are selectively cut, while deliberately preserving short stumps of a predetermined and specific length. This approach ensures the retention of part of the branch structure to support future growth, optimize harvesting efficiency, and maintain the overall health of the plant. To enable the Sichuan pepper harvesting robot to accurately recognize branches and determine the optimal cutting points for efficient short-stump cutting in complex field environments with dense foliage and varying illumination, this study proposes a method for localizing short-stump cutting points on the main branches of green Sichuan pepper based on the U-Net deep learning network and RGB-D depth camera. The method integrates semantic segmentation for branch identification with depth information for spatial localization, establishing a complete processing pipeline from image acquisition to cutting point coordinate determination. First, the traditional U-Net model is improved by replacing its backbone network with ResNet50 embedded with a Coordinate Attention (CA) mechanism, which strengthens the model's ability to capture spatially fine-grained features, consequently enhancing both the boundary completeness and segmentation precision of branch structures, and the Squeeze-and-Excitation (SE) attention mechanism is added in the feature splicing stage of the U-Net model to adaptively recalibrate channel-wise feature responses, thereby constructing a robust segmentation model for the main branches and trunk of Sichuan pepper that effectively distinguishes target structures from complex backgrounds including leaves, fruits, and interfering branches. Then, the segmented images of the main branches and trunk are binarized, and the Zhang & Suen algorithm is used to extract the centerline of the main branches by integrating depth information from the RGB-D camera with OpenCV image processing algorithms. The pixel length in the pixel coordinate system was converted to the physical length in the physical coordinate system through camera intrinsic parameters including focal length and pixel size. It was then transformed into the actual length in the world coordinate system by incorporating depth measurements from the RGB-D camera. Spatial geometric transformations were applied to establish accurate coordinate mappings. The length mapping between the world coordinate system and the pixel coordinate system is achieved, enabling accurate metric-scale measurements of branch dimensions in three-dimensional space for determining the 40 mm stump length. The predefined short stake length of 40 mm is then accurately mapped from the world coordinate system to the corresponding pixel scale in the RGB images, establishing a quantitative correspondence between the physical spatial length and the image pixel dimension. This mapping enables precise localization and determination of the optimal pruning points on each main branch within the image plane. Experimental results clearly demonstrate that the improved U-Net model exhibits superior segmentation performance when compared to other advanced semantic segmentation models such as DeepLabV3+ and PSPNet. Specifically, the enhanced U-Net achieves the Mean Intersection over Union (MIoU) of 87.58%, the mean Pixel Accuracy (mPA) of 93.76%, and the Recall rate of 96.24%, indicating its robustness and effectiveness in accurately identifying and segmenting target features within the image data. Under different lighting conditions, the success rates for identifying and locating the pruning points were 90.81% in direct light, 84.88% in backlight conditions, and 80.52% in cloudy conditions. In the cutting point localization experiment, the localization success rate was 90%, and the average identification process of a single branch took 1.93 s. The results of this study can provide technical support for the " pile picking " harvesting of green Sichuan pepper picking robots.

HTML全文

参考文献(36)

施引文献

资源附件(0)