基于YOLO-Bait多尺度特征增强的养殖场景残饵检测方法

李振华; 王骥; 刘雯景; 杨玉强

doi:10.11975/j.issn.1002-6819.202512049

基于YOLO-Bait多尺度特征增强的养殖场景残饵检测方法

Residual bait detection in aquaculture scenes based on YOLO-Bait with multi-scale feature enhancement

摘要

摘要: 针对水产养殖场景中残饵颗粒细微、对比度低，且图像中干扰目标与噪声较多导致检测精度不足的问题，本研究以YOLOv8n为基线模型进行改进，提出了多尺度特征增强的残饵检测模型YOLO-Bait。首先，引入中心差分卷积（central difference convolution，CDC）构造C2f-CDC（CSP-Bottleneck with 2 CDC）模块，通过聚合特征图中特征点的强度与梯度信息增强对不同尺度残饵颗粒特征的表征能力；其次，结合空洞卷积与空间金字塔技术构造多尺度特征聚合（multi-scale features aggregation，MSFA）模块，替代YOLOv8n的快速空间金字塔池化（spatial pyramid pooling-fast，SPPF）模块，以强化不同层级特征的交互融合并保留关键低层特征；此外，基于聚合-分发机制重新设计颈部网络架构，并移除大尺寸检测层、新增小尺寸检测层，提升网络对残饵颗粒的检测精度；最后，引入尺度交并比（scaled intersection over union，SIoU）损失函数，同步考量距离损失与形状损失，提升模型的鲁棒性。试验结果表明，YOLO-Bait在真实养殖场景的残饵数据集上的精确率、召回率、 \textmA\mathrmP_0.5 和 \textmA\mathrmP_0.5-0.95 分别为94.6%、93.7%、95.7%和65.4%，较基线模型分别提升了2.6、8.6、4.1和7.6个百分点；同时，参数量与模型大小较基线模型分别下降62.1%和57.8%。与现有残饵检测模型YOLOv5s-CAGSDL、YOLO-feed和YOLO-BaitScan相比，其 \textmA\mathrmP_0.5 分别提升了1.6、2.3和1.9个百分点。在安卓移动终端进行部署测试，YOLO-Bait的平均帧率为28.1帧/s，检出率为92.1%，较基线模型提升了4.8个百分点。该研究提出的YOLO-Bait模型可快速精准地检测水下残饵颗粒，为水产养殖场景中残饵量化评估提供一定的技术支持。

Abstract: Detecting residual bait particles in complex aquaculture environments presents significant challenges due to their small size, low contrast against the background, and various interferences such as illumination variations and background noise. These limitations often result in suboptimal performance when using conventional detection methods. To address these issues, this paper proposes YOLO-Bait, an enhanced detection model built upon YOLOv8n that incorporates optimized local and global feature extraction mechanisms. First, a central difference convolution (CDC)-based module, termed C2f-CDC (CSP-Bottleneck with two CDC layers), was developed to replace the original C2f module in the baseline architecture. This modification enhances feature representation by aggregating both intensity and gradient information from the feature maps, thereby improving the model's sensitivity to residual bait particles. Second, a multi-scale feature aggregation (MSFA) module was constructed by integrating atrous convolution with a spatial pyramid structure, replacing the original spatial pyramid pooling fast (SPPF) module in YOLOv8n. This design effectively mitigates semantic gaps across different feature levels, facilitates cross-level feature interaction, and preserves critical low-level spatial details. Third, to improve feature fusion capability in the neck while simultaneously reducing parameter count and increasing inference speed, the neck architecture was redesigned using an aggregate-and-distribute mechanism. This redesign enables global integration of multi-level features and injects the fused information back into each layer. Additionally, replacing large detection heads with smaller ones improved detection accuracy for fine bait particles. Finally, the scaled intersection over union (SIoU) loss function was adopted, incorporating both distance and shape constraints to enhance bounding box robustness. Comparative experiments conducted on residual bait datasets collected from recirculating aquaculture of nibea albiflora and pond culture of litopenaeus vannamei demonstrated that YOLO-Bait achieved a precision of 94.6%, recall of 93.7%, mean average precision at IoU (intersection over union) 0.5 of 95.7%, and mean average precision across IoU thresholds 0.5-0.95 of 65.4%. These results represent improvements of 2.6, 8.6, 4.1, and 7.6 percentage points, respectively, over the baseline YOLOv8n model. The model size and parameter count were reduced to 1.14 MB and 2.7 M, corresponding to decreases of 62.1% and 57.8%, respectively, compared to YOLOv8n. Compared with mainstream models including YOLOv9n, Gold-YOLO, YOLOv11, and YOLOv12, as well as existing bait detection models such as YOLOv5s-CAGSDL, YOLO-feed, and YOLO-BaitScan, YOLO-Bait achieved the mean average precision at IoU 0.5 improvements of 3.1, 0.8, 1.1, 0.9, 1.6, 2.3, and 1.9 percentage points, respectively. Moreover, the C2f module optimized with the CDC strategy demonstrated superior performance in detecting small bait particles compared to alternative convolutional strategies, including Partial Convolution (PC), Deformable Convolution (DCN), Strip Convolution (SC), and Pixel Difference Convolution (PDC), with mean average precision at IoU 0.5 improvements of 1.1, 0.9, 1.1, and 2.0 percentage points, respectively. Visualization of heatmaps generated by different convolutional strategies revealed that the CDC-optimized C2f module produced feature maps that effectively focused on bait particle regions while successfully suppressing background interference. Ablation studies confirmed the individual contributions of each proposed component, demonstrating that the C2f-CDC module, MSFA module, redesigned neck architecture, and SIoU loss function collectively improved both detection accuracy and inference efficiency. Additional experiments on mobile devices showed that the deployed YOLO-Bait model achieved a detection rate of 92.1% for residual baits, which is 4.8 percentage points higher than that of the baseline YOLOv8n model, while maintaining a comparable detection frame rate. These findings indicate that YOLO-Bait enables rapid and accurate localization of underwater residual bait particles, providing reliable technical support for quantitative bait assessment in aquaculture operations and contributing to the advancement of smart fisheries management systems.

HTML全文

参考文献(37)

施引文献

资源附件(0)