基于航拍图像与改进U-net语义分割的复杂果园路径规划

蒲应俊; 任安华; 方智恒; 李姣姣; 赵立军; 陈子文; 杨明金

doi:10.11975/j.issn.1002-6819.202503185

基于航拍图像与改进U-net语义分割的复杂果园路径规划

Complex path planning in orchard using aerial images and improved U-net semantic segmentation

摘要

摘要: 为解决丘陵山区复杂果园作业机器人导航需求，提高果园管理的自动化和智能化水平，该研究提出了一种基于航拍图像与改进U-net语义分割网络的果园作业机器人路径规划方法。首先，利用无人机采集果园不同作业时期的航拍图像，利用改进U-net网络模型识别果园内果树、石板路、排水沟等关键信息。通过将模型编码器中激活函数替换为Swish、最大池化替换为最大模糊池化提高编码器特征提取能力，在编码器后插入RFB模块、在解码器的每个卷积块后插入SE注意力机制以扩大模型感受野，提升模型对果园不同作业期关键信息的分割能力。然后，利用识别的果树、石板路、排水沟等信息划分果园不可通行区域并构建果园地图。最后，采用改进A*算法对果园地图进行路径规划，将原单向A*算法改为双向A*算法以提升路径规划速度，采用动态启发式函数提高算法规划路径的精度，并采用果园转向端点搜索算法获取路径的转向端点，使规划路径经过果园所有果树。结果显示，改进U-net模型的平均交并比mIoU达到92.25%，与原U-net、Res_U-net、DeeplabV3+、PSPnet相比分别提高了2.34、17.00、7.83、19.11个百分点，且在果园各个作业时期数据集的表现均为最佳。另外，改进A*算法规划路径与果园机器人理想行驶路径的均方根误差在0.278～0.710 m，路径规划平均用时36.87 s，比原A*算法、D*算法以及Dijkstra算法平均用时少3.87、6.21、6.41 s。该研究方法可为实际复杂果园路径规划提供有效参考。

Abstract: An accurate and rapid navigation is often required for the orchard robots in complex hilly and mountainous terrain, in order to advance the automation and intelligence of the cultivation. In this study, a path-planning approach was introduced for the orchard robots using aerial imagery and improved U-net semantic segmentation. Firstly, the drones were used to capture the aerial images of the orchard during the fertilization (April 2024), weeding (September 2024) and harvesting period (November 2024). Labelme was employed to annotate the fruit trees, flagstone road, and slabstone features in the images, in order to generate the mask images. An orchard-aerial dataset was obtained from the annotated images after data augmentation. Secondly, the improved U-net model was utilized to train the dataset, and then extract the critical orchard features—including the fruit trees, flagstone road, and drainage ditches—from the aerial images. The convolution structure of the encoder in the original model was retained to change the downsampling from the MaxPool to the MaxBlurPool. The loss of the fine details was minimized for the generalization of the model. The ReLU activation was replaced with the Swish, in order to maintain the gradient fluidity of the encoder, and then mitigate the vanishing-gradient issues. Thirdly, a Receptive Field Block (RFB) was inserted at the final stage of the encoder, in order to obtain the receptive fields of the different sizes after the multi-scale convolution. As such, more diverse orchard-environment information was captured using the improved model. Finally, an SE (Squeeze-and-Excitation) attention mechanism was appended to every decoder block, in order to markedly predict the complex environment over all operational stages. The improved U-net was also applied to predict the key features—fruit trees, flagstone road, and stone slabs. The connected areas of the fruit trees were then divided into the single fruit tree ones after accurate identification, in order to facilitate the fruit tree positioning and path planning. The layer-by-layer opening operation was utilized as follows: 1) The connected component analysis was performed on the fruit tree images, in order to separate into the single- and multi-tree areas. Among them, the horizontal and vertical segmentations were applied into the multi-tree areas. After that, the multi-tree image was processed to merge with the unprocessed single-tree image, in order to obtain a new region of the fruit trees. The first layer of the segmentation was formed in the fruit tree after layer-by-layer segmentation. The second and subsequent layers (from layers 3 to 16) were followed the same procedure, until all the fruit tree regions were segmented into the single-tree regions. Drainage ditches were also inferred from the model-identified slabstone regions. 2) The single-tree areas, flagstone road, and drainage ditches were then classified into the passable and non-passable zones, in order to generate an orchard navigational map. The scale of the map was computed to relate the pixel spacing between slabstone and their measured physical distance. 3) An improved A* algorithm was applied to plan the path of the orchard map. The conventional unidirectional A* was replaced with a bidirectional variant accelerated planning. While a dynamic heuristic was improved the path accuracy along stone-slab roads. An orchard-specific turning-endpoint search was then used to identify the corner points, thereby ensuring that the planned route passed every fruit tree in the orchard. The results showed that the improved U-net was achieved in a mean Intersection-over-Union (mIoU) of 92.25%, thus outperforming the original U-Net, Res-U-Net, DeepLabV3+, and PSPNet by 2.34, 17.00, 7.83, and 19.11 percentage points, respectively. The mean pixel accuracy (MPA) of 95.72% was exceeded the original U-Net, Res-U-Net, DeepLabV3+, and PSPNet by 1.40, 15.76, 2.93, and 4.37 percentage points, respectively. The best performance was consistently achieved after training. In addition, the improved A* algorithm yielded a root-mean-square error of only 0.278–0.710 m relative to the actual optimal driving path of the orchard robot. And its mean path planning time was 36.87 s, which was 3.87, 6.21, and 6.41 s faster than the original A*, D*, and Dijkstra algorithm, respectively. The highly feasible approach can offer a practical reference for path planning in complex real-world orchards.

HTML全文

参考文献(31)

施引文献

资源附件(0)