基于改进YOLOv8m的成熟柿子品种及表型特征多标签识别

耿耀君; 马萍; 李雯敏; 苗园爽; 关长飞; 李润雨; 黄铝文

doi:10.11975/j.issn.1002-6819.202503128

基于改进YOLOv8m的成熟柿子品种及表型特征多标签识别

Multi-label recognition of ripen persimmons varieties and phenotypic characteristics based on improved YOLOv8m

摘要

摘要: 柿子表型特征是其种质资源鉴定的重要依据。为快速准确识别成熟柿子的品种和表型特征，该研究提出了一种改进YOLOv8m的成熟柿子品种及其表型特征多标签识别模型YOLOv8m-LCA。针对自然环境中成熟柿子轮廓存在不规则、难以准确提取的问题，在YOLOv8m网络中，将大型可分离核注意力模块（large separable kernel attention, LSKA）添加到C2f模块（cross-stage partial-connection with 2 convolutions），以增强果实边缘特征细节的提取和减少模型计算量。为提高模型对关键表型特征的识别率，在骨干网络中新增4个卷积块注意力模块（convolutional block attention module, CBAM），以增强模型对空间和通道维度的加权注意力处理能力；采用双路径下采样模块替代原有的卷积下采样模块，以减少特征图高频信息损失。调整头部网络的分类损失函数，实现品种与表型特征的多标签实例输出。结果表明：在自主构建30个柿子品种以及14类表型特征数据集上，相比YOLOv8m，所提出的模型YOLOv8m-LCA整体识别精确率（P）及平均精度均值（mAP@50）分别为93.0%和94.9%，分别提升了7.4和6.8个百分点；模型大小和参数量分别为27.6 MB和13.63 M，分别降低了47.02%和47.29%，且优于YOLOv5m、YOLOv11m、YOLOv12m等当前主流轻量级YOLO系列目标检测算法。YOLOv8m-LCA对随机选取的6类柿子品种及其表型特征的检测置信度范围为0.90~0.97，验证了改进模型对柿子关键特征提取的有效性。该方法可为柿子及其他水果种质资源表型特征鉴定提供模型参考。

Abstract: Persimmons fruits, originating from China, boast a cultivation history exceeding 3 000 years. In actual production practice, the phenotypic characteristics of persimmon fruits serve as crucial criteria for variety identification. Attributes such as fruit shape, longitudinal groove, fruit apex, cross furrow, and fruit indent, constitute the foundation for naming and classifying new varieties, underscoring the significance of phenotypic traits for germplasm resource identification. Nevertheless persimmon species are diverse and resourceful, with large similarities between varieties, making manual identification time-consuming and labor-intensive. Moreover, the phenotypic parameters of naturally grown ripe persimmons exhibit different degrees of heterogeneity and diversity—such as inconsistent outline shapes and sizes, varying depths of longitudinal grooves, cross-grooves, and fruit indent, as well as differences in the spatial geometry of fruit apex—making it challenging to use a single phenotypic parameter as a baseline for accurately observing and discriminating among different varieties. Currently, researches on phenotypic multi-label recognition primarily focus on the extraction of fruit phenotypic parameters, with relatively few studies dedicated to phenotypic recognition of persimmon fruits. To achieve rapid and accurate identification of ripe persimmon varieties and their phenotypic traits, an enhanced multi-label recognition model, YOLOv8m-LCA, is proposed based on YOLOv8m. Here, LCA represents three newly optimized modules in the baseline network architecture of YOLOv8m: L stands the initial of LSKA (large separable kernel attention) module, C for that of CBAM (convolutional block attention module) and A for that of subsampling Adown module. To address the problem of irregular size and difficulty in accurately extracting the contours of naturally ripened persimmons, a large separable kernel attention module is newly added into the C2f (cross-stage partial-connection with 2 convolutions) module in the YOLOv8m network, enhancing the fruit edge feature details and reducing the computation load of the model. To improve the recognition rate of the model for key phenotypic features, four new convolutional block attention modules were newly added into the backbone network to strengthen its weighted attention processing capability in both spatial and channel dimensions. Furthermore, a dual-path subsampling module was adopted to replace the original convolutional subsampling module reducing the feature maps' high-frequency information loss. The classification loss function of the head network was optimized to enable the output of multi-labelled instances encompassing both varieties and phenotypic features. During the image acquisition process, objective factors such as variations in light intensity were fully considered to obtain a persimmon image dataset containing 30 varieties. Images that failed to accurately reflect the target features were removed, and enhancement operations were applied to the divided training and validation images, yielding a total of 5 060 images in the enhanced dataset. The results show that on the self-constructed dataset of 30 persimmon varieties and 14 types of phenotypic characters, in comparison with YOLOv8m, the proposed YOLOv8m-LCA model achieves an overall recognition precision of 93.0%, a recall of 92.0%, and an mAP@50 of 94.9%, representing respective increases of 7.4, 5.8 and 6.8 percentage points. Additionally, the model size, the floating point of operations, and the number of parameters stand at 27.6 MB, 41.3 G and 13.63 M, which are 47.02%, 47.59% and 47.29% lower than those of YOLOv8m, respectively. The method proposed in this paper outperform other current and classical YOLO series algorithms, such as that of YOLOv5m, YOLOv9m, YOLOv10m, YOLOv11m and YOLOv12m, respectively. It provids a model reference for the identification of phenotypic characteristics of persimmon and other fruit germplasm resources, and can also be applied to multi-label feature extraction for crops with more complex spatial structure, edges and textures in natural growing environment.

HTML全文

参考文献(31)

施引文献

资源附件(0)