Abstract
Detecting residual bait particles has presented significant challenges in complex aquaculture environments, due to their small size, low contrast against the background, and various interferences, such as illumination and background noise. Conventional detection has been limited to suboptimal performance. In this study, an improved model (YOLO-Bait) was proposed to detect the residual baits in aquaculture scenes using YOLOv8n. Multi-scale feature enhancement was incorporated to optimize the local and global feature extraction. Specifically, 1) a central difference convolution (CDC) module, termed C2f-CDC (CSP-Bottleneck with two CDC layers), was developed to replace the original C2f module in the baseline architecture. Feature representation was enhanced to aggregate both intensity and gradient information from the feature maps after modification, thereby improving the sensitivity to residual bait particles. 2) A multi-scale feature aggregation (MSFA) module was constructed to integrate Atrous convolution with a spatial pyramid structure. The original spatial pyramid pooling fast (SPPF) module in YOLOv8n was replaced to effectively mitigate the semantic gaps over different feature levels. Low-level spatial features were also preserved to facilitate the cross-level feature interaction. 3) Feature fusion was improved in the neck architecture to simultaneously reduce the parameter count for the high inference speed. Neck architecture was then redesigned for the global integration of multi-level features using an aggregate-and-distribute mechanism. The fused information was injected back into each layer. Additionally, the large detection heads were replaced with the smaller ones. The high accuracy was obtained in detecting the fine bait particles. 4) The scaled intersection over union (SIoU) loss function was adopted to incorporate both distance and shape constraints for the bounding box robustness. A series of experiments was finally conducted on residual bait datasets from recirculating aquaculture of nibea albiflora and pond culture of litopenaeus vannamei. The results demonstrated that the YOLO-Bait was achieved in a precision of 94.6%, recall of 93.7%, mean average precision at IoU (intersection over union) 0.5 of 95.7%, and mean average precision over IoU thresholds 0.5-0.95 of 65.4%. There were improvements of 2.6, 8.6, 4.1, and 7.6 percentage points, respectively, over the baseline YOLOv8n model. The model size and parameter count were reduced to 1.14 MB and 2.7 M, with decreases of 62.1% and 57.8%, respectively, compared with the YOLOv8n. The YOLO-Bait was achieved in the mean average precision at IoU 0.5 improvements of 3.1, 0.8, 1.1, 0.9, 1.6, 2.3, and 1.9 percentage points, respectively, compared with mainstream models, including YOLOv9n, Gold-YOLO, YOLOv11, and YOLOv12, as well as existing bait detection models, such as YOLOv5s-CAGSDL, YOLO-feed, and YOLO-BaitScan. Moreover, the C2f module was optimized with the CDC strategy. The superior performance was obtained to detect the small bait particles, compared with the alternative convolutional strategies, including partial convolution (PC), deformable convolution (DCN), strip convolution (SC), and pixel difference convolution (PDC), with the mean average precision at IoU 0.5 improvements of 1.1, 0.9, 1.1, and 2.0 percentage points, respectively. Heatmap visualization revealed that the feature maps were effectively focused on the bait particle regions to successfully suppress the background interference. Ablation studies demonstrated that there were individual contributions of each component using different convolutional strategies. Both accuracy and inference efficiency were collectively improved by the C2f-CDC module, MSFA module, neck architecture, and SIoU loss function. Additional experiments on mobile devices showed that the YOLO-Bait model was deployed with a detection rate of 92.1% for the residual baits, which was 4.8 percentage points higher than that of the baseline YOLOv8n model, particularly with a comparable frame rate. The YOLO-Bait can be expected to rapidly and accurately locate the residual bait particles underwater in aquaculture environments. These findings can provide reliable technical support for the bait assessment in smart fisheries.