Abstract:
Rice is a staple crop worldwide, and its yield and quality directly influence global food security and the agricultural economy. The International Rice Research Institute reports that rice pests and diseases can cut farmers’ yields by up to 37 percent, with observed losses ranging from 24 percent to 41 percent. Real time monitoring and counting of pests is the cornerstone of any green prevention and control system: accurate identification of pests at different scales and reliable tracking of their population dynamics provide the data required for science based interventions, thereby reducing pesticide use and residue risk. Rice pests rank among the most common biological disasters that threaten stable, high yield rice production. They disturb normal plant growth by gnawing leaves, boring stems, or sucking sap. At the least, they stunt plants, yellow leaves, and lower grain number; at worst, they cause total crop failure. Yet the pest community in paddy fields is extraordinarily diverse. Sizes range from millimetre scale aphids and thrips to stem borers and leaf folder larvae exceeding ten millimetres. All may co occur in the same plot and simultaneously inhabit leaves, leaf sheaths, stems, or panicles. This complex spatial distribution, extreme morphological variation, and heavy background clutter make traditional manual scouting or simple image processing methods unable to meet the demands of accurate detection and counting. To overcome these challenges, we presented YOLO-MSLP (multi-scale lightweight pest), an intelligent lightweight model for rice-pest detection and counting. Built upon the latest YOLOv11n backbone, YOLO-MSLP introduced three key innovations tailored to the complexities of paddy-field scenes. First, an adaptive pooling bidirectional feature pyramid network (AP-BiFPN) was embedded in the neck. By means of adaptive pooling that dynamically adjusted the receptive field and bidirectional cross scale fusion, this module allowed the model to extract and aggregate multiscale features in a stable manner, whether the targets were solitary pests or dense clusters. This greatly improved the completeness of small object detection and the accuracy of large object localisation. Second, a multi-scale triplet attention module (MS-TAM) was inserted between the backbone and detection heads. Operating in parallel across channel, spatial, and scale dimensions, this module adaptively highlighted discriminative pest features such as shape, texture, and colour while suppressing redundant background information that closely resembled the pests. Experiments showed that the module maintained high confidence outputs even under back lighting, leaf occlusion, or overlapping rice plants. Finally, to lower deployment barriers, the backbone was reengineered with a reparameterized vision transformer (RepViT) and further compressed through knowledge distillation, transferring rich representations from a larger teacher network into the lightweight student. After pruning, quantization, and operator fusion, YOLO-MSLP achieves a mean Average Precision (mAP) of 94.5 % and a recall of 91.7 %, representing improvements of 2.8 % and 2.3 % respectively. Floating point operations were reduced by 24.4 %, and model size shrank by 40.7 %. Inference latency for a single image on an edge GPU fell below 35 ms. Extensive testing confirmed that YOLO MSLP runs in real time on embedded devices, providing a low-cost, highly reliable tool for early warning, precise spraying, and green control of rice pests. The model is expected to play a pivotal role in large-scale smart-agriculture deployments and to advance the sustainable development of the rice industry.