基于改进YOLO v8n的果园梨果生长品质检测及双目定位方法

陈志强; 雷哓晖; 刘杨; 吕晓兰

doi:10.11975/j.issn.1002-6819.202509148

摘要: 在复杂果园环境下对梨果生长品质进行快速精确检测并做到实时采摘点定位是实现农业机器人智能化采摘的关键技术之一。针对目前复杂果园环境下梨果外观识别准确率低、检测速度不能满足实时采摘刚需等问题，该研究提出了一种基于YOLOv8n的改进型梨果检测方法。首先，将模型主干网络中的 C2f 模块替换为具有更高计算效率的FasterNet Block结构实现轻量化目标。其次，在网络中引入 GAM注意力机制（global attention mechanism，GAM），以增强关键特征信息的提取能力。最后采用Inner-CIoU损失函数，优化边界框回归过程，提高预测框的定位精度。试验结果表明，改进模型精确率、召回率和平均精度分别为96.80%、93.40%和96.70%，较基线提升4.0、3.2和4.0个百分点；浮点运算次数和内存占用量降低30.23%与48.15%；在嵌入式平台推理速度达180帧/s，功耗19W。进一步结合英特尔RealSense D455i双目相机进行三维标定与坐标系转换，实现了对健康梨果的精准空间定位。将此识别与定位系统部署于自主开发的梨果采摘执行机构，在室外开展了采摘平台试验，采摘成功率约为 90.2%，平均连续采摘速度约为 5 s/个。该研究有效解决了果实采收机器人视觉感知中的技术难题，且适用于性能受限设备的部署应用，为梨果等果实的采收机器人提供了关键的识别与定位技术支持。

Abstract: Harvesting robots are often required for the rapid and accurate detection of pear fruit growth quality with real-time picking-point localization under complex orchard environments in smart agriculture. In this study, an improved YOLOv8n model was proposed for the pear fruit detection and localization, particularly for the high accuracy under occlusion and variable illumination, while the high inference speed on embedded devices. Three enhancements were incorporated: (1) The C2f modules in the backbone network were replaced with FasterNet Block using partial convolution. Computational redundancy was significantly reduced to optimize memory access efficiency. (2) A global attention mechanism was introduced after the spatial pyramid pooling fast (SPPF) layer. Critical feature was extracted to focus more effectively on small targets, while suppressing background interference. (3) The original CIoU loss function was replaced with Inner-CIoU using a scale factor of 0.8 after systematic experimentation. The convergence of the improved model was accelerated to enhance the gradient and localization precision for small and overlapping pear fruits. An image dataset was also constructed to verify the improved model. Pear fruit images were captured by an Intel RealSense D455i binocular camera in natural orchards. Multiple varieties and challenging conditions were covered, such as diseases, fruit overlaps, and occlusion. Data augmentation also expanded the dataset to 3,000 images. Experimental results demonstrate that the improved YOLOv8n-Pear model achieved a precision of 96.80%, a recall of 93.40%, and a mean average precision of 96.70%. Compared with the baseline YOLOv8n, these metrics were improved by 4.0, 3.2, and 4.0 percentage points, respectively. Moreover, the floating-point operations were reduced by 30.23% and memory footprint by 48.15%, from 7.1 to 4.2 MB. On the embedded Jetson Orin NX platform, the better performance achieved an average inference speed of 180.3 frames per second with a power consumption of only 19 W, indicating the real-time deployment on power-constrained systems. The binocular camera was calibrated for 3D localization. A coordinate transformation was established to convert 2D pixel coordinates of healthy pear fruits into 3D world ones. Field tests show that the maximum positioning errors in the X, Y, and Z directions were 12, 12, and 10 mm, respectively, with average errors of 6.6, 7.1 and 7.1 mm, respectively, all within acceptable limits for robotic harvesting. Finally, the vision system was integrated with a four-degree-of-freedom harvesting actuator on outdoor Y-trellis pear trees. The better performance was achieved in the harvesting success rate of approximately 90.2% and an average continuous picking time of about 5 s per fruit over ten experimental groups with 100 picking times, fully meeting the practical requirements of robot harvesting. The improved YOLOv8n model effectively balanced the high accuracy and low computational cost. The finding can also provide a robust solution for visual perception in fruit harvesting robots, particularly under resource-constrained embedded platforms.

基于改进YOLO v8n的果园梨果生长品质检测及双目定位方法

Orchard pear fruit growth-quality detection and binocular localization method based on improved YOLOv8n