Abstract:
Lingwu long jujube is commonly referred to as Chinese date, one of the special advantage fruits in Ningxia Hui Autonomous Region. Currently, image segmentation has widely been used to identify the ripeness of Lingwu long jujubes in modern agriculture. Traditional image segmentation can achieve better segmentation effects for the red part of Lingwu long jujubes, but not for the green part. Consequently, deep learning can contribute to multi-scale object segmentation for the jujubes with various ripeness. Besides, the improved network models can be expected to extract multi-scale features from the different sizes of objects in Lingwu long jujubes images. In the actual operation, a visual recognition system needs to meet the execution time of an actuator in a picking robot under a complex working environment. Correspondingly, the network model of image segmentation is required to be small, shallow, and high accuracy at a relatively low picking speed. In this study, an improved FCN-8s was selected as the basic network for the image segmentation of Lingwu long jujubes with different ripeness. Firstly, the image dataset of Lingwu long jujubes was established, including 196 training and 46 test images. Since the initial resolution of collected images was 4 000×3 000, the resolution of 1 280×960 was used for training, in order to improve the training efficiency. Then, a multi-scale feature extract module was proposed to extract features beyond 3×3 scale. Specifically, a 1×1 convolution and a 5×5 convolution were added into a single 3×3 standard convolution in FCN-8s. A depth-wise separable convolution was applied to the 5×5 convolution to reduce parameters, considering that numerous parameters were introduced after adding two auxiliary branches. As such, the 3×3 standard convolution in FCN-8s was replaced with the proposed module. Some improvements were made on FCN-8s to reduce the parameters of the network for high efficiency. The 14th and 15th convolution layers were removed from the original FCN-8s to maintain the segmentation accuracy, and then the up-sampling operation was directly performed after the 5th down sampling operation. In addition, the half channels were reduced in the output feature maps for each layer in the multi-scale feature extraction module with three branches, compared with the original one. Therefore, the improved FCN-8s was obtained to increase the width of the whole network. The experimental results on Lingwu long jujubes dataset showed that the intersection over union, mean intersection over union, precision accuracy, recall rate, and F1 score were 93.50%, 96.41%, 98.44%, 97.86%, and 98.15%, respectively, which were 11.31, 6.20, 1.51, 5.21, and 3.14 percentage points higher than the original FCN-8s. The network parameters of the improved FCN-8s were 5.37 million, and the segmentation speed was 16.20 frames/s. Compared with the SegNet, ENet, and PSPNet, the improved FCN-8s presented remarkable advantages for the high requirements of visual recognition in the picking robot for Lingwu long jujubes.