基于改进FCN-8s的灵武长枣图像分割方法

薛君蕊; 王昱潭; 曲爱丽; 张加欣; 邢振伟; 魏海岩; 孙浩伟

doi:10.11975/j.issn.1002-6819.2021.05.022

摘要: 针对不同成熟度灵武长枣图像中多尺度长枣目标的分割问题，该研究提出了一种基于改进FCN-8s的灵武长枣图像分割方法。首先，建立包含不同成熟度灵武长枣图像的数据集。其次，提出一个多尺度特征提取模块，该模块以3×3卷积为主分支，增加1×1卷积和5×5深度可分离卷积作为辅助分支，以提取灵武长枣图像中的多尺度特征。然后，用多尺度特征提取模块替换FCN-8s中的3×3卷积，又对FCN-8s网络结构做了一定的改进，得到了改进FCN-8s。最后，在灵武长枣数据集上进行试验，结果表明，改进FCN-8s的枣类交并比、平均交并比、像素准确率、召回率和像素准确率、召回率的调和平均值F1分数分别达到了93.50%、96.41%、98.44%、97.86%和98.15%，比原FCN-8s的各项评价指标分别高出了11.31、6.20、1.51、5.21和3.14个百分点。网络参数量为5.37×106，分割速度为16.20帧/s。改进FCN-8s满足了灵武长枣智能化采摘机器人对视觉识别系统的要求，为实现灵武长枣的智能化采摘提供了技术支持。

Abstract: Lingwu long jujube is commonly referred to as Chinese date, one of the special advantage fruits in Ningxia Hui Autonomous Region. Currently, image segmentation has widely been used to identify the ripeness of Lingwu long jujubes in modern agriculture. Traditional image segmentation can achieve better segmentation effects for the red part of Lingwu long jujubes, but not for the green part. Consequently, deep learning can contribute to multi-scale object segmentation for the jujubes with various ripeness. Besides, the improved network models can be expected to extract multi-scale features from the different sizes of objects in Lingwu long jujubes images. In the actual operation, a visual recognition system needs to meet the execution time of an actuator in a picking robot under a complex working environment. Correspondingly, the network model of image segmentation is required to be small, shallow, and high accuracy at a relatively low picking speed. In this study, an improved FCN-8s was selected as the basic network for the image segmentation of Lingwu long jujubes with different ripeness. Firstly, the image dataset of Lingwu long jujubes was established, including 196 training and 46 test images. Since the initial resolution of collected images was 4 000×3 000, the resolution of 1 280×960 was used for training, in order to improve the training efficiency. Then, a multi-scale feature extract module was proposed to extract features beyond 3×3 scale. Specifically, a 1×1 convolution and a 5×5 convolution were added into a single 3×3 standard convolution in FCN-8s. A depth-wise separable convolution was applied to the 5×5 convolution to reduce parameters, considering that numerous parameters were introduced after adding two auxiliary branches. As such, the 3×3 standard convolution in FCN-8s was replaced with the proposed module. Some improvements were made on FCN-8s to reduce the parameters of the network for high efficiency. The 14th and 15th convolution layers were removed from the original FCN-8s to maintain the segmentation accuracy, and then the up-sampling operation was directly performed after the 5th down sampling operation. In addition, the half channels were reduced in the output feature maps for each layer in the multi-scale feature extraction module with three branches, compared with the original one. Therefore, the improved FCN-8s was obtained to increase the width of the whole network. The experimental results on Lingwu long jujubes dataset showed that the intersection over union, mean intersection over union, precision accuracy, recall rate, and F1 score were 93.50%, 96.41%, 98.44%, 97.86%, and 98.15%, respectively, which were 11.31, 6.20, 1.51, 5.21, and 3.14 percentage points higher than the original FCN-8s. The network parameters of the improved FCN-8s were 5.37 million, and the segmentation speed was 16.20 frames/s. Compared with the SegNet, ENet, and PSPNet, the improved FCN-8s presented remarkable advantages for the high requirements of visual recognition in the picking robot for Lingwu long jujubes.

基于改进FCN-8s的灵武长枣图像分割方法

Image segmentation method for Lingwu long jujubes based on improved FCN-8s