高级检索+

基于YOLO-Bait多尺度特征增强的养殖场景残饵检测方法

Residual bait detection in aquaculture scenes based on YOLO-Bait with multi-scale feature enhancement

  • 摘要: 针对水产养殖场景中残饵颗粒细微、对比度低,且图像中干扰目标与噪声较多导致检测精度不足的问题,该研究以YOLOv8n为基线模型进行改进,提出了多尺度特征增强的残饵检测模型YOLO-Bait。首先,引入中心差分卷积(central difference convolution,CDC)构造C2f-CDC(CSP-Bottleneck with 2 CDC)模块,通过聚合特征图中特征点的强度与梯度信息增强对不同尺度残饵颗粒特征的表征能力;其次,结合空洞卷积与空间金字塔技术构造多尺度特征聚合(multi-scale features aggregation,MSFA)模块,替代YOLOv8n的快速空间金字塔池化(spatial pyramid pooling-fast,SPPF)模块,以强化不同层级特征的交互融合并保留关键低层特征;此外,基于聚合-分发机制重新设计颈部网络架构,并移除大尺寸检测层、新增小尺寸检测层,提升网络对残饵颗粒的检测精度;最后,引入尺度交并比(scaled intersection over union,SIoU)损失函数,同步考量距离损失与形状损失,提升模型的鲁棒性。试验结果表明,YOLO-Bait在真实养殖场景的残饵数据集上的精确率、召回率、 \textmA\mathrmP_0.5 和 \textmA\mathrmP_0.5-0.95 分别为94.6%、93.7%、95.7%和65.4%,较基线模型分别提升了2.6、8.6、4.1和7.6个百分点;同时,参数量与模型大小较基线模型分别下降62.1%和57.8%。与现有残饵检测模型YOLOv5s-CAGSDL、YOLO-feed和YOLO-BaitScan相比,其 \textmA\mathrmP_0.5 分别提升了1.6、2.3和1.9个百分点。在安卓移动终端进行部署测试,YOLO-Bait的平均帧率为28.1帧/s,检出率为92.1%,较基线模型提升了4.8个百分点。该研究提出的YOLO-Bait模型可快速精准地检测水下残饵颗粒,为水产养殖场景中残饵量化评估提供一定的技术支持。

     

    Abstract: Detecting residual bait particles has presented significant challenges in complex aquaculture environments, due to their small size, low contrast against the background, and various interferences, such as illumination and background noise. Conventional detection has been limited to suboptimal performance. In this study, an improved model (YOLO-Bait) was proposed to detect the residual baits in aquaculture scenes using YOLOv8n. Multi-scale feature enhancement was incorporated to optimize the local and global feature extraction. Specifically, 1) a central difference convolution (CDC) module, termed C2f-CDC (CSP-Bottleneck with two CDC layers), was developed to replace the original C2f module in the baseline architecture. Feature representation was enhanced to aggregate both intensity and gradient information from the feature maps after modification, thereby improving the sensitivity to residual bait particles. 2) A multi-scale feature aggregation (MSFA) module was constructed to integrate Atrous convolution with a spatial pyramid structure. The original spatial pyramid pooling fast (SPPF) module in YOLOv8n was replaced to effectively mitigate the semantic gaps over different feature levels. Low-level spatial features were also preserved to facilitate the cross-level feature interaction. 3) Feature fusion was improved in the neck architecture to simultaneously reduce the parameter count for the high inference speed. Neck architecture was then redesigned for the global integration of multi-level features using an aggregate-and-distribute mechanism. The fused information was injected back into each layer. Additionally, the large detection heads were replaced with the smaller ones. The high accuracy was obtained in detecting the fine bait particles. 4) The scaled intersection over union (SIoU) loss function was adopted to incorporate both distance and shape constraints for the bounding box robustness. A series of experiments was finally conducted on residual bait datasets from recirculating aquaculture of nibea albiflora and pond culture of litopenaeus vannamei. The results demonstrated that the YOLO-Bait was achieved in a precision of 94.6%, recall of 93.7%, mean average precision at IoU (intersection over union) 0.5 of 95.7%, and mean average precision over IoU thresholds 0.5-0.95 of 65.4%. There were improvements of 2.6, 8.6, 4.1, and 7.6 percentage points, respectively, over the baseline YOLOv8n model. The model size and parameter count were reduced to 1.14 MB and 2.7 M, with decreases of 62.1% and 57.8%, respectively, compared with the YOLOv8n. The YOLO-Bait was achieved in the mean average precision at IoU 0.5 improvements of 3.1, 0.8, 1.1, 0.9, 1.6, 2.3, and 1.9 percentage points, respectively, compared with mainstream models, including YOLOv9n, Gold-YOLO, YOLOv11, and YOLOv12, as well as existing bait detection models, such as YOLOv5s-CAGSDL, YOLO-feed, and YOLO-BaitScan. Moreover, the C2f module was optimized with the CDC strategy. The superior performance was obtained to detect the small bait particles, compared with the alternative convolutional strategies, including partial convolution (PC), deformable convolution (DCN), strip convolution (SC), and pixel difference convolution (PDC), with the mean average precision at IoU 0.5 improvements of 1.1, 0.9, 1.1, and 2.0 percentage points, respectively. Heatmap visualization revealed that the feature maps were effectively focused on the bait particle regions to successfully suppress the background interference. Ablation studies demonstrated that there were individual contributions of each component using different convolutional strategies. Both accuracy and inference efficiency were collectively improved by the C2f-CDC module, MSFA module, neck architecture, and SIoU loss function. Additional experiments on mobile devices showed that the YOLO-Bait model was deployed with a detection rate of 92.1% for the residual baits, which was 4.8 percentage points higher than that of the baseline YOLOv8n model, particularly with a comparable frame rate. The YOLO-Bait can be expected to rapidly and accurately locate the residual bait particles underwater in aquaculture environments. These findings can provide reliable technical support for the bait assessment in smart fisheries.

     

/

返回文章
返回