高级检索+

基于YOLO11n的芦笋采收机器人多目标分割与识别定位

Multi-Target Segmentation, Recognition, and Localization for an Asparagus-Harvesting Robot Based on YOLO11n

  • 摘要: 针对芦笋采收机器人作业过程中的芦笋嫩茎细长、遮挡重叠和母茎干扰导致的多目标分割与识别难题,该研究以YOLO11n-seg为基线模型,提出改进模型YOLO11n-SAL。引入多尺度边缘增强模块(Multi-scale edge enhancement module, MEEM),通过对图像边缘特征的分解和增强提升模型 边界分割的精度,增强对细长目标的检测能力;同时,结合分离与增强注意力机制(Separated and enhancement attention module, SEAM),通过感知并融合遮挡芦笋的多尺度特征增强遮挡环境下对芦笋的检测识别效果。试验结果表明,改进后的YOLO11n-SAL模型在各项评估指标上均取得显著提升,边框下的检测精度、召回率、 \textmAP_0.5\left(Box\right) 和 \textmAP_0.5-0.95\left(Box\right) 分别为94.2%、83.1%、91.2%和76.2%;掩码下的分割精度、召回率、 \textmAP_0.5\left(Mask\right) 和 \textmAP_0.5-0.95\left(Mask\right) 分别为93.4%、77.9%、90.7%和62.7%。热力图分析与检测分割效果对比表明,YOLO11n-SAL模型在不同场景下对芦笋边缘特征的感知能力以及遮挡条件下的多目标分割识别效果均有明显改善,与基线模型相比,检测分割效果更好,且能够有效应对复杂情况的干扰,显著提升了不同场景下芦笋的分割识别精度。为验证模型实际部署的识别及定位效果,基于深度相机和机械臂进行芦笋识别、定位及采收抓取试验,定位成功率达90%,采收抓取效果良好,为农业机器人精准采收作业提供了可靠的技术支撑。

     

    Abstract: This research is meticulously centered on addressing the core technical challenges currently hindering the efficacy of robotic vision systems within the domain of automated asparagus harvesting operations. In their natural growth state, asparagus spears are characterized by a slender morphology. When growing densely in field conditions, these tender stems are highly prone to mutual occlusion and overlapping phenomena. Furthermore, the simultaneous presence of stout "mother stems" creates complex background interference. These factors collectively precipitate a degradation in the accuracy of visual-based multi-target segmentation and recognition tasks, which in turn severely compromises the precise positioning and harvesting capabilities of the robotic end-effector.To surmount the aforementioned obstacles, this study adopts the lightweight instance segmentation model, YOLO11n-seg, as a baseline. Consequently, a novel, optimized model named YOLO11n-SAL is proposed, specifically tailored to handle slender, occluded targets with high fidelity. The core architectural advancements of the model are embodied in two newly introduced modules designed to enhance feature extraction and attention mechanisms.First, the Multi-scale Edge Enhancement Module (MEEM) was conceptually designed and integrated. The primary objective of this module is to mitigate the issue wherein the edge features of slender asparagus targets are inherently weak and easily lost during convolutional operations. By performing multi-scale decomposition on convolutional feature maps, the MEEM effectively extracts and intensifies edge and contour information separately before executing feature fusion. This process significantly elevates the model's sensitivity to target boundaries and improves segmentation precision, thereby enhancing the perceptual capability regarding targets with slender morphological structures.Secondly, the Separated and Enhancement Attention Module (SEAM) was introduced. This mechanism is specialized to rectify feature confusion and data incompleteness caused by inter-target occlusion. Through attention separation operations across both channel and spatial dimensions, SEAM enables the model to adaptively perceive local and global features of occluded asparagus at varying scales. By selectively enhancing and effectively fusing these features, the model is better positioned to focus on the visible subjects of partially masked targets while suppressing background noise and distractor information. This ensures that the system maintains robust detection and recognition performance even within complex, cluttered environments.To rigorously verify the effectiveness of the proposed model, systematic comparative experiments were conducted. Quantitative evaluation results indicate that the improved YOLO11n-SAL model achieved significant gains across all key performance indicators compared to the baseline model. In the target bounding box detection task, the model demonstrated superior performance with detection Precision of 94.2%, Recall rate of 83.1%, mean Average Precision at IoU threshold 0.5 ( mAP_0.5\left(Box\right) ) of 91.2%, and mean Average Precision at IoU threshold 0.5-0.95( mAP_0.5-0.95^ \left(Box\right) ) of 76.2%.In the more granular instance mask segmentation task, the model performed equally impressively. The segmentation Precision, Recall, mean Average Precision at IoU threshold 0.5( mAP_0.5\left(Mask\right) ), and mean Average Precision at IoU threshold 0.5( mAP_0.5-0.95\left(Mask\right) ) reached 93.4%, 77.9%, 90.7%, and 62.7%, respectively. Furthermore, heatmap analysis and a comparative assessment of detection and segmentation effects demonstrate that the YOLO11n-SAL model exhibits a marked improvement in perceiving asparagus edge features across different scenarios, as well as superior multi-target segmentation and recognition capabilities under occluded conditions. Compared to the baseline, the proposed method yields better detection and segmentation outcomes and effectively handles interference in complex situations, significantly boosting segmentation and recognition accuracy in multi-scenario environments.Finally, to validate the recognition and positioning performance in actual deployment scenarios, a series of asparagus recognition, positioning, and harvesting grasping trials were carried out using depth cameras and mechanical arms. The empirical results showed a positioning success rate of 90%, accompanied by effective harvesting and grasping performance. These findings confirm that the proposed system provides reliable technical support for the advancement of precise agricultural robotic harvesting operations.

     

/

返回文章
返回