高级检索+

基于YOLOv11n-MBOD的海洋底栖生物检测算法

A marine benthic organism detection algorithm based on YOLOv11n-MBOD

  • 摘要: 为解决海洋底栖生物目标存在边缘细节不易捕捉、尺度变化剧烈及小目标密集分布等特性,现有检测算法在复杂海底环境中的识别精度受限问题,该研究提出一种基于改进YOLOv11n的海洋底栖生物检测方法YOLOv11n-MBOD(marine benthic organism detection)。首先,引入并组合上下文引导下采样(context guide block downsampling, CGDown)模块与改进的可变形自适应融合下采样(deformable adaptive fusion downsampling, DAFDown)模块,构成分层式的协作下采样机制,旨在强化模型对小目标及不规则形状的特征学习能力。其次,为提升模型对局部细节与边缘轮廓的捕捉精度,设计轻量化的重参数渐进式卷积模块(reparameterized progressive convolution block, ReProBlock)。此外,还提出一种任务感知交互检测头(task-aware interactive head, TAI-Head),其通过促进定位与分类任务间的信息交互,实现对复杂特征的深度理解与补偿。最后,引入归一化Wasserstein距离(normalized Wasserstein distance, NWD)与加权最小点距离交并比损失(weighted minimum point distance intersection over union, Wise-MPDIoU)构成的复合损失函数,以提升最终预测框的定位精度与稳定性。结果表明,YOLOv11n-MBOD模型的精确度、召回率及平均精度均值(mAP50),相较于基线模型(YOLOv11n)分别提升1.5、2.6和2.6个百分点,参数量仅为2.37 M,优于主流的深度检测算法。可视化结果证明,YOLOv11n-MBOD在实际检测中表现优异,能够有效应对复杂多变的海底环境。该研究提出的方法有效提高海底目标的检测精度,可为水产养殖自动化及海洋生态监测提供有力的技术支持。

     

    Abstract: Seafloor object detection plays a vital and increasingly significant role in promoting the development of the marine economy, effectively supporting a wide range of applications, including ecological monitoring, sustainable aquaculture management, and marine resource exploration. However, marine benthic targets generally encounter several critical challenges that make detection difficult, such as hard-to-capture edge details, significant scale variations, and the dense distribution of small targets across seafloor environments. Traditional object detection algorithms, when applied to such complex underwater scenarios, typically lack sufficient targeted adaptability and robustness, which often leads to limited detection accuracy and reduced reliability in practical applications. To address these challenges and improve detection performance, this study proposed an enhanced marine benthic organism detection (MBOD) algorithm based on YOLOv11n, namely YOLOv11n-MBOD. The original YOLOv11n architecture was enhanced by replacing several original modules. Firstly, context guide block downsampling (CGDown) was introduced and combined with the improved deformable adaptive fusion downsampling (DAFDown) to form a hierarchical cooperative downsampling mechanism. This combined mechanism is specifically designed to strengthen the model’s ability to learn rich features from small targets and objects with irregular shapes, thereby comprehensively enhancing the overall feature extraction capability of the network. Secondly, a lightweight feature enhancement module, reparameterized progressive convolution block (ReProBlock), was developed. This module optimizes the extraction and fusion of features from fine-grained local details to broader global multi-region information, further improving the model’s ability to capture subtle textures, edge contours, and important structural information of benthic organisms. In addition, a task-aware interactive head (TAI-Head) was proposed to facilitate more effective information interaction between localization and classification tasks, enabling the network to achieve deep understanding and effective compensation for complex underwater features. Finally, a composite loss function combining normalized Wasserstein distance (NWD) and weighted minimum point distance intersection over union (Wise-MPDIoU) was introduced to improve both the accuracy and stability of predicted bounding boxes. Experimental results demonstrated that the precision, recall, and mean average precision (mAP50) of the YOLOv11n-MBOD model increased by 1.5, 2.6, and 2.6 percentage points, respectively, compared with the baseline model (YOLOv11n), demonstrating good detection performance. The number of parameters was reduced by 0.21 M compared with the baseline, and the computational cost slightly increased to 9.6 G. However, compared with Faster R-CNN and YOLOv11s, which have large parameter counts and high computational costs, YOLOv11n-MBOD significantly reduces the model overhead, while its precision, recall, and mAP50 are 0.6–2.5, 0.1–7.8, and 0.3–6.6 percentage points higher than those of Faster R-CNN and YOLOv11s, respectively. Compared with CEH-YOLO, an excellent similar seafloor object detection model, the precision, recall, and mAP50 of the proposed model are higher by 0.8, 0.3, and 0.6 percentage points, respectively. Furthermore, compared with mainstream models of the same scale under similar parameter counts and computational costs, the precision, recall, and mAP50 of YOLOv11n-MBOD are 1.2–3.4, 2.4–4.4, and 2.8–4.3 percentage points higher than those of YOLOv5n, YOLOv8n, YOLOv10n, YOLO12, and YOLO13, respectively. Visualization results prove that in practical detection scenarios involving complex seafloor environments characterized by blurred edges, low-light occlusion, dense small objects, and overlapping objects, YOLOv11n-MBOD achieves the best practical detection performance with no missed detections or false positives. This reflects the flexible adaptability of the model to unknown complex seafloor scenarios, and further confirms the high accuracy, strong robustness, and excellent generalization ability of YOLOv11n-MBOD. The outstanding detection accuracy indicates that the proposed method can effectively cope with various complex seafloor conditions and provides robust technical support for automated aquaculture operations as well as marine ecological monitoring.

     

/

返回文章
返回