基于改进DeepLabv3+的收获机视角下小麦倒伏区域在线检测

王发明; 周洁; 莫昊一; 倪昕东; 陈度; 王玲

doi:10.11975/j.issn.1002-6819.202505196

基于改进DeepLabv3+的收获机视角下小麦倒伏区域在线检测

Online detection of wheat lodging area from the perspective of harvester based on improved DeepLabv3+

摘要

摘要: 针对小麦收获机器人面对倒伏区域存在的漏收、堵塞等问题，该研究提出一种基于改进DeepLabv3+的收获机视角下小麦倒伏区域在线检测方法。首先基于车载相机构建包含4250幅机收场景倒伏图像数据集。其次，提出一种轻量化倒伏区域在线检测模型，以DeepLabv3+为基准模型，采用MobileViT主干网络融合CNN-Transformer优势降低参数量，重构深度可分离空洞卷积的ASPP（atrous spatial pyramid pooling）模块维持多尺度特征融合能力，并嵌入CBAM（convolutional block attention module）注意力机制强化倒伏区域边缘特征感知。试验结果表明，基于MobileViT主干网络的模型参数量为5.47 MB，较MobileNetV2和Xception分别减少75.6%和97.4%，模型在自建数据集上的平均交互比mIoU、平均像素准确率mPA、F1-Score、精确率Precision及召回率Recall指标分别为94.10%、97.44%、96.91%、96.40%与97.45%，较原DeepLabv3+分别提升3.70、2.79、2.09、3.05与1.00百分点，小区域及边界分割效果最优。最后，开发了小麦倒伏区域在线检测系统，达到14.23帧/s的实时检测速率，倒伏区域检测的像素相对误差低于1%。研究结果可为小麦收获机器人高效低损作业调控提供技术支撑。

Abstract: In response to the key challenges frequently encountered by wheat harvesting robots during actual field operations—such as missed harvesting in lodging areas, incomplete harvesting processes, and header blockages—this study developed an effective online detection method for wheat lodging areas based on an improved DeepLabv3+ semantic segmentation framework. The objective was to enhance the environmental perception and recognition capabilities of harvesting robots operating in complex and variable farmland conditions, thereby supporting efficient and low-loss mechanised harvesting. A specialised dataset consisting of 4, 250 high-resolution images depicting lodging scenarios in wheat fields was constructed using a ZED2i vehicle-mounted binocular RGB camera. Data collection was conducted in June 2023 in the main winter wheat production zone of Xincao Farm, Yancheng, Jiangsu Province, China. To improve the model’s generalisation performance under diverse field conditions, various data augmentation strategies were applied, including random colour adjustment, image rotation, and contrast enhancement, simulating changes in ambient lighting and motion blur induced by machinery vibration. Each image was meticulously annotated at the pixel level using the LabelMe tool, allowing for precise semantic segmentation training and evaluation. A lightweight and high-performance segmentation model, named MVDC-DeepLabv3+, was developed to meet the demands of real-time field deployment. This model introduced three major architectural enhancements. First, the original Xception backbone of DeepLabv3+ was replaced by MobileViT, a hybrid lightweight architecture that combined convolutional neural networks and Transformer components. This change significantly reduced model complexity while improving the capacity to capture local texture features and global semantic information, which was essential for recognising small-scale and irregular lodging areas in cluttered environments. Second, the Atrous Spatial Pyramid Pooling (ASPP) module was restructured using depthwise separable atrous convolutions and an optimised dilation rate scheme, which preserved the ability to fuse multi-scale contextual features while reducing computational cost. Third, a Convolutional Block Attention Module (CBAM) was embedded following shallow feature extraction to enhance the model's focus on subtle edge and boundary information through combined channel and spatial attention mechanisms. Experimental results demonstrated that MVDC-DeepLabv3+ achieved excellent segmentation accuracy on the constructed dataset. The model obtained a mean Intersection over Union (mIoU) of 94.10%, a mean Pixel Accuracy (mPA) of 97.44%, an F1-score of 96.91%, a precision of 96.40%, and a recall of 97.45%. Compared with the original DeepLabv3+, the performance is improved by 3.70, 2.79, 2.09, 3.05, and 1.00 percentage points respectively. The total model size is reduced to 5.47 MB, which is 75.6% and 97.4% smaller than the MobileNetV2 and Xception backbones, respectively, making it suitable for deployment on embedded and resource-constrained platforms. Comparative analysis with other mainstream semantic segmentation models such as UNet, SegNet, BiSeNet, and PSPNet indicated that the proposed model exhibited superior performance in accurately segmenting small lodging areas and delineating complex boundaries.To further validate the model’s robustness, additional tests were conducted under low-light nighttime conditions and dusty environments. The model consistently achieved lodging detection pixel errors below 1.50%, confirming its reliability under challenging visual conditions. Finally, the segmentation model was integrated into an online detection system developed with PyQt5 and a vehicle-mounted camera interface. During real-world operation, the system achieved an average inference speed of 14.23 frames per second, with relative pixel error maintained below 1%. These results confirmed that the proposed approach combined high segmentation accuracy, strong robustness, and real-time performance, offering an effective technical solution for intelligent wheat harvesting applications in precision agriculture.

HTML全文

参考文献(33)

施引文献

资源附件(0)