Abstract:
Apple surface defects have posed a serious threat to the grading efficiency and market value of the fruits. An accurate and rapid detection can be expected for the post-harvest quality of the fruits in recent years. However, the conventional single-view imaging has been widely adopted to only capture the partial surface information of the fruits, due to the irregular spherical shape of the apples. The missed detection of the defects can often occur in the uncollected areas. It is then required to detect the large-scale blind areas for the high-precision quality inspection in the intelligent upgrade of the apple industry. In this study, a three-view imaging method was proposed to detect the apple surface defects. The high redundancy in the multi-view imaging was also reduced to detect the features of the surface defects under the complex backgrounds. A reliable detection was realized using three key technical procedures. Firstly, an imaging dataset was collected for the apple surface defects. Three Intel RealSense D415 depth cameras were deployed to accurately obtain the imaging area. The hardware synchronization function of the cameras was also used to acquire the simultaneous data in real time. The RGB and depth images after data acquisition were converted into three-dimensional point clouds after internal parameter calibration. Then a series of point cloud processing steps were performed on the image data, including the preprocessing, coarse registration, fine registration, downsampling, and surface reconstruction. The 3D reconstruction of the apples was realized to calculate the area of the joint imaging region in the three-view system. Secondly, a region segmentation was proposed, suitable for the three-view apple images using a standard sphere model. The redundant background and overlapping regions were removed from the three-view images. Thirdly, the basic You Only Look Once version 11 (YOLOv11) model was improved to enhance the performance of the detection. Specifically, the C3k module in the Neck part of the original model was replaced with the Non-local Attention Residual Multi-Layer Perceptron (NARM) module. The NARM-YOLOv11 model was constructed to capture the long-range feature dependencies and then identify the small-scale defects. A series of experiments was carried out to verify the effectiveness of the improved model. The results showed that the three-view imaging was realized to fuse the multi-angle surface information of the apples after the precise point cloud registration and reconstruction, in terms of the imaging area performance. There was an increase from 34.6% of the single-view imaging to 74.3% in the average proportion of the apple surface imaging area. Among them, the detection blind area was significantly reduced to basically cover most of the apple surface. In the image segmentation with the standard sphere model, the redundant regions were effectively removed from the three-view images, with an average redundant region removal rate of 20.5%. Furthermore, the average defect detection repetition rate caused by overlapping imaging areas was reduced from 26.0% of the original images to 7.6% after segmentation. The average missed detection rate was controlled at 3.6%. The high redundancy was avoided in the multi-view imaging for the high accuracy of the subsequent defect identification. In the test of the improved NARM-YOLOv11 model, the precision, recall, and mean average precision (mAP) of the NARM-YOLOv11 model increased by 2.7, 2.5, and 3.4 percentage points, respectively, compared with the basic YOLOv11 model. As such, the NARM module was introduced to enhance the feature extraction, especially for the small-scale and low-contrast surface defects of the apples. The frame rate decreased only by 1.7 frames per second, thereby fully meeting the requirements of real-time detection in practical applications. The reason was that the model complexity shared a slight increase due to the addition of the attention mechanism and multi-layer perceptron structure. The overall performance of the detection was achieved for the detection of the apple surface defects. The average precision reached 89.7%, and the average defect recognition rate was 88.1%, indicating the high reliability and practicality of the integrated system. The three-view imaging, image segmentation, and improved NARM-YOLOv11 model were combined to detect the features under complex backgrounds, in order to avoid the large blind area of the single-view imaging and the high redundancy of the three-view imaging. The full-surface defect detection of the spherical fruits can provide a feasible technical scheme for the intelligent upgrading of the post-harvest quality inspection in the apple industry. The finding can also offer solid support to combine the multi-view imaging and deep learning in modern agriculture.