基于Roofline理论的YOLOv8橙果识别模型轻量化改进

刘洁; 王一帆; 马京奥; 戴梓健; 杨宝钦; 杨恬甜

doi:10.11975/j.issn.1002-6819.202411139

基于Roofline理论的YOLOv8橙果识别模型轻量化改进

Optimizing orange fruit recognition model using Roofline theory and lightweight improved YOLOv8

摘要

摘要: 为平衡有限算力嵌入式系统检测目标的实时性和准确率，基于Roofline理论以降低访存量和计算量为出发点提出一种YOLOv8n-Light橙果识别模型。引入ShuffleNetv2轻量级主干网络代替原Backbone复杂冗余卷积，根据橙果目标特点设计基于共享卷积的轻量化探头以替换原检测头，借助SEAttention注意力机制重构Concat模块，结合MPDIoU与Focaler-IoU思想优化损失函数以重分配准确率（P）与召回率（R）比例，利用2500幅图像数据构建改进模型数据集。结果表明，改进模型YOLOv8n-Light准确率为96.5%，较原YOLOv8n模型提升2.2个百分点，召回率为89.5%，平均精度（mAP）为97.0%；在Raspberry Pi 4B 8G平台上推理速度为每秒2.8帧，较原模型提升64.7%；果园试验中引导采摘机械臂末端执行器在X、Y、Z方向上的平均定位误差分别为2.48、3.13、4.13 mm，识别准确率97.59%，定位准确率96.39%，采摘成功率93.98%。该算法可为柑橘类果实识别模型轻量化改进和采摘机具研发提供依据和参考。

Abstract: Orange (Citrus sinensis) has been one of the most key fruit varieties, due to its outstanding economic value. Its large-scale planting has been an important carrier to promote the rural revitalization in the hilly areas. Current manual harvesting cannot fully meet the large-scale production in recent years, due to the high labor costs, low efficiency and prone to fruit damage. Particularly, the shortage of agricultural labor is ever increasingly prominent, with the urbanization and aging population structure. Mechanization and intelligence transformation of harvesting operations can be inevitable for the industrial development. This study aims to enhance the efficiency and accuracy of the real-time orange detection in the unstructured orchard environments under the constraints of the embedded edge computing platforms. A lightweight object detection model, named YOLOv8n-Light, was proposed in alignment with the Roofline performance model, in order to increase the computational intensity for the memory access. Thereby, a systematic optimization was also made to balance between resource consumption and detection accuracy. The baseline YOLOv8n network was modified to replace its backbone with the lightweight ShuffleNetV2 architecture. ShuffleNetV2 was utilized the channel splitting, pointwise convolution, and depthwise separable convolution. The parameter size and computational cost were minimized to extract the fine-grained features. Furthermore, a novel lightweight detection head was introduced on the top of this backbone, according to the shared 3×3 convolutional kernels across feature pyramid levels. The redundant parameter storage and activation memory traffic were significantly reduced to a streamlined and more efficient pipeline. A concatenation module was restructured to incorporate the SE (Squeeze-and-Excitation) attention mechanism. The channel-wise responses were recalibrated using feature importance. The SE module was enhanced the network's sensitivity to the relevant object features under complex conditions, such as the varying illumination, background clutter, and partial occlusion. The loss function was redesigned to integrate the MPDIoU (Minimum Point Distance Intersection over Union) and the Focaler-IoU (Focalized Intersection over Union), in order to improve the localization. This hybrid loss function was imposed the stronger penalties on the inaccurate bounding boxes and dynamically balanced the precision-recall trade-off, according to the quality of each prediction. As a result, there were the high regression accuracy and robust detection sensitivity. A series of the experiments were conducted on a Raspberry Pi 4B platform with 8 GB of RAM. The YOLOv8n-Light model reached an inference speed of 2.8 FPS (frames per second). There was the 64.7% increase, compared with the original YOLOv8n. The strong performance of the detection was attained a precision of 96.5%, which is 2.2 percentage points higher than the original YOLOv8n model, a recall of 89.5%, and a mAP (mean Average Precision) of 97.0%. Field evaluations were carried out in an orchard using a six-degree-of-freedom robotic arm equipped with an Intel RealSense depth camera. The average positioning errors were measured as 2.48 mm along the X-axis, 3.13 mm along the Y-axis, and 4.13 mm along the Z-axis. The robotic fruit-picking system was achieved in a recognition accuracy of 97.59%, a localization accuracy of 96.39%, and an overall picking success rate of 93.98%. The applicability of the system was improved under real-world agricultural conditions. In conclusion, the YOLOv8n-Light model was effectively balanced the computational efficiency and detection accuracy on the resource-constrained embedded platforms. An optimized loss function was integrated with the architectural improvements and attention mechanisms. The reliable performance was achieved in both controlled and real-world orchard environments. The lightweight refinement of the citrus fruit detection can serve as a strong reference for the automated harvesting equipment.

HTML全文

参考文献(32)

施引文献

资源附件(0)