Abstract:
Green and pollution-free vegetables have been much more popular in recent years, as food safety increases. The fertilizers and pesticides can be controlled for safe and non-toxic vegetables. Alternatively, the pests can seriously threaten the yield and quality of the tomatoes. Timely and accurate identification of the pests is also conducive to the quality and yield of tomatoes with less fertilizer. However, there are diverse tomato pests in the form under complex backgrounds. Manual detection cannot fully meet the large-scale production, due to its high subjectivity and easy to miss. It is often required for the accurate detection of the tomato pests, and then the deployment of lightweight models, in order to balance the detection performance and complexity of the model. A lightweight improved model was proposed, named YOLOv8n-DFS, in order to efficiently and accurately detect the tomato pests in complex environments on edge computing devices. Firstly, the neck structure was replaced with the Cross-scale Feature Fusion (CCFM) to fuse, modulate, and reuse the features of different detection layers, in order to reduce the model complexity. Meanwhile, the model can intergrate feature information from multiple scales to enhance robustness in complex scenarios. Secondly, the ADown module was introduced to replace the part of the Conv module. The detection performance was then retained using a lower number of channels and a smaller spatial dimension via combined pooling. Dual optimization was achieved in both the channel and scale dimensions. CCFM was also combined to reduce the complexity of the model structure, the computational load, and the number of parameters of the model. Finally, a lightweight Self-Attention mechanism head was designed after optimization. The computational load of the detection head was reduced by combining the downsampling and upsampling. Moreover, the self-attention mechanism was utilized to enhance the global context information and long-distance dependencies. The feature extraction was improved in complex scenes. Additionally, the skip connection was added to the attention mechanism in order to avoid the overfitting caused by excessive reliance on the attention mechanism. The original feature was retained to suppress the irrelevant background. A series of tests was conducted on a self-built dataset, including five pests:
Aphididae,
Bemisia tabaci,
Agrotis ypsilon,
Helicoverpa armigera, and
Spodoptera litura. The test results showed that the various modifications effectively enhanced the performance and lightweight of the YOLOv8n-DFS model. The detection performance and lightweight degree of the improved YOLOv8n-DFS model were superior to those of the mainstream models, including Faster R-CNN, SSD, YOLOv7-tiny, YOLOv8n, YOLOv9-t, YOLOv10n, YOLOv11n, and YOLOv12n. The precision, recall, and mAP of YOLOv8n-DFS reached 91.5%, 88.9% and 93.5%, respectively. Compared with the YOLOv8n model, the precision, recall, and mAP increased by 0.7, 1.8, and 0.5 percentage points, respectively. The FLOPs, parameter, and model size of YOLOv8n-DFS reached 3.2 G, 1.497×10
6, and 3.13 MB, respectively. Compared with YOLOv8n, the FLOPs, parameters, and model size of YOLOv8n-DFS decreased by 60.5%, 50.2%, and 47.7%, respectively. Compared with the Faster R-CNN, SSD, and the rest five original models of the YOLO series, the mAP increased by 4.9, 2.2, 1.2, 0.4, 1.7, 1.1, and 1.3 percentage points, respectively. The FLOPs values were reduced by 945.0, 56.9, 10.0, 7.5, 5.0, 3.1, and 3.3 G, respectively, and the parameter sizes were reduced by 26.8, 10.7, 4.5, 1.1, 1.2, 1.1, and 1.1 M, respectively. The model size was reduced by 104.9, 44.1, 8.6, 2.7, 2.4, 2.1, and 2.2 MB, respectively. The deployment test was performed on the edge computing device. The frame rate of YOLOv8n-DFS reached 12.2 frame per second and 25.3 frame per second after TensorRT acceleration, which increased by 59.1%, compared with the YOLOv8n. The detection performances of the YOLOv8n-DFS on edge computing devices were closer to the true values than those of YOLOv8n. The YOLOv8n-DFS model targeted 33 pests, only 2 cases were missed, and 1 case was misdetected, indicating the lightweight and better detection. The finding can effectively cope with the interference of the complex situations, thus meeting the high precision and lightweight requirements of pest detection on the edge computing devices.