Abstract:
Persimmons fruits, originating from China, boast a cultivation history exceeding 3 000 years. In actual production practice, the phenotypic characteristics of persimmon fruits serve as crucial criteria for variety identification. Attributes such as fruit shape, longitudinal groove, fruit apex, cross furrow, and fruit indent, constitute the foundation for naming and classifying new varieties, underscoring the significance of phenotypic traits for germplasm resource identification. Nevertheless persimmon species are diverse and resourceful, with large similarities between varieties, making manual identification time-consuming and labor-intensive. Moreover, the phenotypic parameters of naturally grown ripe persimmons exhibit different degrees of heterogeneity and diversity—such as inconsistent outline shapes and sizes, varying depths of longitudinal grooves, cross-grooves, and fruit indent, as well as differences in the spatial geometry of fruit apex—making it challenging to use a single phenotypic parameter as a baseline for accurately observing and discriminating among different varieties. Currently, researches on phenotypic multi-label recognition primarily focus on the extraction of fruit phenotypic parameters, with relatively few studies dedicated to phenotypic recognition of persimmon fruits. To achieve rapid and accurate identification of ripe persimmon varieties and their phenotypic traits, an enhanced multi-label recognition model, YOLOv8m-LCA, is proposed based on YOLOv8m. Here, LCA represents three newly optimized modules in the baseline network architecture of YOLOv8m: L stands the initial of LSKA (large separable kernel attention) module, C for that of CBAM (convolutional block attention module) and A for that of subsampling Adown module. To address the problem of irregular size and difficulty in accurately extracting the contours of naturally ripened persimmons, a large separable kernel attention module is newly added into the C2f (cross-stage partial-connection with 2 convolutions) module in the YOLOv8m network, enhancing the fruit edge feature details and reducing the computation load of the model. To improve the recognition rate of the model for key phenotypic features, four new convolutional block attention modules were newly added into the backbone network to strengthen its weighted attention processing capability in both spatial and channel dimensions. Furthermore, a dual-path subsampling module was adopted to replace the original convolutional subsampling module reducing the feature maps' high-frequency information loss. The classification loss function of the head network was optimized to enable the output of multi-labelled instances encompassing both varieties and phenotypic features. During the image acquisition process, objective factors such as variations in light intensity were fully considered to obtain a persimmon image dataset containing 30 varieties. Images that failed to accurately reflect the target features were removed, and enhancement operations were applied to the divided training and validation images, yielding a total of 5 060 images in the enhanced dataset. The results show that on the self-constructed dataset of 30 persimmon varieties and 14 types of phenotypic characters, in comparison with YOLOv8m, the proposed YOLOv8m-LCA model achieves an overall recognition precision of 93.0%, a recall of 92.0%, and an mAP@50 of 94.9%, representing respective increases of 7.4, 5.8 and 6.8 percentage points. Additionally, the model size, the floating point of operations, and the number of parameters stand at 27.6 MB, 41.3 G and 13.63 M, which are 47.02%, 47.59% and 47.29% lower than those of YOLOv8m, respectively. The method proposed in this paper outperform other current and classical YOLO series algorithms, such as that of YOLOv5m, YOLOv9m, YOLOv10m, YOLOv11m and YOLOv12m, respectively. It provids a model reference for the identification of phenotypic characteristics of persimmon and other fruit germplasm resources, and can also be applied to multi-label feature extraction for crops with more complex spatial structure, edges and textures in natural growing environment.