Abstract:
Seafloor object detection plays a vital and increasingly significant role in promoting the development of the marine economy, effectively supporting a wide range of applications, including ecological monitoring, sustainable aquaculture management, and marine resource exploration. However, marine benthic targets generally encounter several critical challenges that make detection difficult, such as hard-to-capture edge details, significant scale variations, and the dense distribution of small targets across seafloor environments. Traditional object detection algorithms, when applied to such complex underwater scenarios, typically lack sufficient targeted adaptability and robustness, which often leads to limited detection accuracy and reduced reliability in practical applications. To address these challenges and improve detection performance, this study proposed an enhanced marine benthic organism detection (MBOD) algorithm based on YOLOv11n, namely YOLOv11n-MBOD. The original YOLOv11n architecture was enhanced by replacing several original modules. Firstly, context guide block downsampling (CGDown) was introduced and combined with the improved deformable adaptive fusion downsampling (DAFDown) to form a hierarchical cooperative downsampling mechanism. This combined mechanism is specifically designed to strengthen the model’s ability to learn rich features from small targets and objects with irregular shapes, thereby comprehensively enhancing the overall feature extraction capability of the network. Secondly, a lightweight feature enhancement module, reparameterized progressive convolution block (ReProBlock), was developed. This module optimizes the extraction and fusion of features from fine-grained local details to broader global multi-region information, further improving the model’s ability to capture subtle textures, edge contours, and important structural information of benthic organisms. In addition, a task-aware interactive head (TAI-Head) was proposed to facilitate more effective information interaction between localization and classification tasks, enabling the network to achieve deep understanding and effective compensation for complex underwater features. Finally, a composite loss function combining normalized Wasserstein distance (NWD) and weighted minimum point distance intersection over union (Wise-MPDIoU) was introduced to improve both the accuracy and stability of predicted bounding boxes. Experimental results demonstrated that the precision, recall, and mean average precision (mAP50) of the YOLOv11n-MBOD model increased by 1.5, 2.6, and 2.6 percentage points, respectively, compared with the baseline model (YOLOv11n), demonstrating good detection performance. The number of parameters was reduced by 0.21 M compared with the baseline, and the computational cost slightly increased to 9.6 G. However, compared with Faster R-CNN and YOLOv11s, which have large parameter counts and high computational costs, YOLOv11n-MBOD significantly reduces the model overhead, while its precision, recall, and mAP50 are 0.6–2.5, 0.1–7.8, and 0.3–6.6 percentage points higher than those of Faster R-CNN and YOLOv11s, respectively. Compared with CEH-YOLO, an excellent similar seafloor object detection model, the precision, recall, and mAP50 of the proposed model are higher by 0.8, 0.3, and 0.6 percentage points, respectively. Furthermore, compared with mainstream models of the same scale under similar parameter counts and computational costs, the precision, recall, and mAP50 of YOLOv11n-MBOD are 1.2–3.4, 2.4–4.4, and 2.8–4.3 percentage points higher than those of YOLOv5n, YOLOv8n, YOLOv10n, YOLO12, and YOLO13, respectively. Visualization results prove that in practical detection scenarios involving complex seafloor environments characterized by blurred edges, low-light occlusion, dense small objects, and overlapping objects, YOLOv11n-MBOD achieves the best practical detection performance with no missed detections or false positives. This reflects the flexible adaptability of the model to unknown complex seafloor scenarios, and further confirms the high accuracy, strong robustness, and excellent generalization ability of YOLOv11n-MBOD. The outstanding detection accuracy indicates that the proposed method can effectively cope with various complex seafloor conditions and provides robust technical support for automated aquaculture operations as well as marine ecological monitoring.