高级检索+

基于三维点云的葡萄采摘场景感知与采摘点定位方法

Perceiving grape harvest scene to locate picking point using 3D point clouds

  • 摘要: 为实现非结构化果园环境中葡萄簇三维采摘点的高精度定位,该研究提出了一种融合3D点云与2D图像的视觉感知与认知方法。首先,采用Point Transformer V2模型对采摘场景进行精细语义分割,为后续聚类与采摘点定位提供语义支撑。其次,结合葡萄簇形态特征,提出了三维空间下的葡萄簇采摘点定位算法(3D grape picking point localization algorithm, 3D GPPLA),利用DBSCAN与K-Means的双阶段聚类策略,有效分离多串葡萄簇并实现单串葡萄的三维采摘点定位。为应对三维定位失败,进一步引入基于RGB图像的补偿机制,通过SegFormer模型实现二维语义感知,并结合二维采摘点定位算法(2D grape picking point localization algorithm, 2D GPPLA)完成坐标投影与三维精度补全。试验结果表明,Point Transformer V2在语义分割任务中的mIoU达89.83%,其中果梗与枝干类别IoU分别为78.55%与84.20%。在1847簇葡萄样本试验中,3D GPPLA算法在单簇与多簇场景下的采摘点定位成功率分别为98.81%和80.95%,总体达89.11%。结果验证了所提方法在三维采摘点定位中的高精度与鲁棒性,为葡萄采摘机器人视觉系统优化及非结构化环境下的低损采摘提供了技术支撑。

     

    Abstract: Accurate recognition of the grape picking point is essential for the intelligent, efficient, and non-destructive harvesting in grape-picking robots. However, the robustness of the 3D localization is often confined to the various influencing factors, such as the occlusions, irregular lighting, and the complex spatial distribution of the grape clusters under unstructured orchard environments. Therefore, it is highly required to enhance the overall reliability of the harvesting decisions. In this study, a dual-modal visual perception and cognition framework was proposed to integrate both 3D point clouds and 2D RGB images for the robust and precise picking point localization under diverse orchard conditions. Firstly, the 3D semantic scene was selected. Point Transformer V2 (PTV2), a point-cloud processing model was incorporated the grouped vector attention and relative positional encoding, in order to capture both local geometric structures and long-range contextual dependencies. Point clouds were acquired from a depth camera, and then semantically segmented into the classes, such as the grapes, stems, and branches. The structural foundation was formed for the subsequent geometric analysis. The PTV2 model was achieved in the high accuracy of the segmentation, with a mean Intersection over Union (mIoU) of 89.83%, and the IoU values for the peduncle and branche were 78.55% and 84.20% respectively. The strong recognition was realized in the real orchard scenarios. A 3D Grape Picking Point Localization Algorithm (3D GPPLA) was proposed to determine the picking points within the complex arrangements of the grape cluster. A two-stage clustering was introduced using DBSCAN and K-Means. Multi-cluster grapes were semantically segmented into the independent candidate cluster point clouds. After that, the morphology validation was implemented to determine whether each cluster was corresponded to an individual grape bunch. The recursive depth was also restricted to prevent the over-segmentation for the computational efficiency. Once the invalid partition was obtained within the allowed depth, the system was rolled back into a previous clustering state for the stability. Once a single grape bunch was identified, 3D GPPLA was used to estimate the picking point, according to the spatial relationship between the grape centroid and the peduncle region. Specifically, the minimum bounding box around the grape-peduncle subset was computed to determine the optimal picking direction. The peduncle proximity and accessibility were evaluated for the minimal damage during separation, and the consistent harvesting performance. A 2D fallback strategy was also introduced to further enhance the robustness, in cases where the 3D approach failed, due to the severe occlusion, missing depth data, or segmentation noise. Once one failure was detected, the 2D RGB image was extracted to switch into the image inference. By leveraging SegFormer, a state-of-the-art transformer-based semantic segmentation network, two-dimensional images are partitioned into high-fidelity semantic representations, including grape and peduncle, among others. The 2D GPPLA algorithm was then computed the picking points in the image space, according to the shape heuristics and spatial priors. Subsequently, the 3D point cloud was projected after depth-aligned pixel mapping. The fallback mechanism was enhanced the resilience in the cluttered and partially observable environments. While the richer texture and color cues available in RGB images were also compensated for the limitations in the point-cloud resolution and sensor noise. A custom dataset was comprised 1,847 grape clusters that collected under natural orchard conditions. The 3D GPPLA was achieved in a picking-point localization success rate of 89.11%. Particularly, there were the success rates of 98.81% in the single-cluster scenarios and 80.95% in the multi-cluster arrangements, highlighting its adaptability to the varying levels of the structural complexity. When combined with the 2D fallback strategy, the high overall reliability was significantly reduced the failure cases under the cluttered and occluded scenarios. As such, the framework was integrated with the advanced 3D semantic segmentation, adaptive multi-stage clustering, and cross-modal compensation. The accurate, stable, and efficient picking-point localization was achieved in the unstructured vineyard environments. This finding can also provide a solid technical contribution to the practical deployment of the grape-harvesting robots in smart agriculture.

     

/

返回文章
返回