Abstract:
Precise harvesting is often required in the complex natural environment of eggplants. In this study, an improved YOLO-CRC instance segmentation model was proposed with the reparameterized structure using the YOLOv8-Seg framework. A sample dataset of eggplants and their stems was also constructed with some variations in the illumination, orientation, and occlusion levels. Multimodal data augmentation was employed for the model exposure to more diverse data during training. Various transformations and processing were performed on the original data, thereby generating new training samples to expand the dataset. Firstly, a Diverse Branch Block (DBB) and a polarization attention mechanism were introduced to design the CDCP module in the Backbone layer. The performance of the model was significantly improved to segment the small targets. The DBB module was used to extract some features at the varying scales. Different sizes of the convolution kernel or operation types were utilized to reduce the complexities across its branches. These diverse outputs were then aggregated after summation. The information was integrated from the multiple layers. The DBB’s parallel structure was allowed for the effective fusion of the features at different scales during model inference, The stable performance of object detection was achieved even under the varying backgrounds and complexities. In the Neck layer, a reparameterization technique was applied to optimize the C2f module, thus introducing the RepVit Block to form the C2f_RVB module. The RVB module was enhanced the expressive power to capture the richer feature details using multi-scale convolution and a channel mixing mechanism. Furthermore, the RVB module was utilized the depthwise separable convolutions and an adaptive channel attention mechanism. The computational complexity was significantly reduced to break down the convolution. While the adaptive attention mechanism was strengthened the focus on the critical features, in order to weight the different feature channels. The CARAFE upsampling factor was used to replace the original ones for the upsampling recovery. The upsampling kernel was also tailored after a content-aware mechanism. Semantic information was balanced from the low-resolution feature maps, in order to more accurately reconstruct the spatial information. Computational efficiency was maintained to enhance the feature reuse, thus improving the fusion of the local and contextual information. Ablation experiments demonstrated that the improved YOLO-CRC model was achieved in a mean average precision of 94.1%. The segmentation accuracies were 95.9% and 92.2%, respectively, for the eggplants and eggplant pedicels. While the mAP at an IoU range of 0.5 to 0.95 was 70.6%. A dataset was also constructed to test the generalization performance of the model on new data, except for the training. The YOLO-CRC model exhibited the favorable generalization and excellent processing on new data. The overall processing speed of the model was slightly slower than that of the rest models; However, the frame rate of 44 frames per second (FPS) was sufficient to fully meet the requirements of most experimental scenarios. Grad-CAM++ heatmap analysis revealed that the improved model was focused its attention more effectively on eggplants and pedicels. As such, the 2D localization of eggplant pedicel harvesting points was achieved using the enhanced segmentation model. Furthermore, 3D localization of the harvesting points was successfully realized to integrate a depth camera. Ten experiments were conducted on the three-dimensional positioning of the depth camera, where the five orientations were selected. The experimental results show that the average positioning error was 2.13 mm, which referred to the mean value of the total errors from each measurement; The maximum error was 2.68 mm; and the average relative error was 1.18%. This finding can provide the valuable technical insights into the visual recognition and localization in the eggplant harvesting robots. Great contribution can be gained to advance the automated harvesting technologies for the eggplants crops.