Abstract:
In response to the key challenges frequently encountered by wheat harvesting robots during actual field operations—such as missed harvesting in lodging areas, incomplete harvesting processes, and header blockages—this study developed an effective online detection method for wheat lodging areas based on an improved DeepLabv3+ semantic segmentation framework. The objective was to enhance the environmental perception and recognition capabilities of harvesting robots operating in complex and variable farmland conditions, thereby supporting efficient and low-loss mechanised harvesting. A specialised dataset consisting of 4,250 high-resolution images depicting lodging scenarios in wheat fields was constructed using a ZED2i vehicle-mounted binocular RGB camera. Data collection was conducted in June 2023 in the main winter wheat production zone of Xincao Farm, Yancheng, Jiangsu Province, China. To improve the model’s generalisation performance under diverse field conditions, various data augmentation strategies were applied, including random colour adjustment, image rotation, and contrast enhancement, simulating changes in ambient lighting and motion blur induced by machinery vibration. Each image was meticulously annotated at the pixel level using the LabelMe tool, allowing for precise semantic segmentation training and evaluation. A lightweight and high-performance segmentation model, named MVDC-DeepLabv3+, was developed to meet the demands of real-time field deployment. This model introduced three major architectural enhancements. First, the original Xception backbone of DeepLabv3+ was replaced by MobileViT, a hybrid lightweight architecture that combined convolutional neural networks and Transformer components. This change significantly reduced model complexity while improving the capacity to capture local texture features and global semantic information, which was essential for recognising small-scale and irregular lodging areas in cluttered environments. Second, the Atrous Spatial Pyramid Pooling (ASPP) module was restructured using depthwise separable atrous convolutions and an optimised dilation rate scheme, which preserved the ability to fuse multi-scale contextual features while reducing computational cost. Third, a Convolutional Block Attention Module (CBAM) was embedded following shallow feature extraction to enhance the model's focus on subtle edge and boundary information through combined channel and spatial attention mechanisms. Experimental results demonstrated that MVDC-DeepLabv3+ achieved excellent segmentation accuracy on the constructed dataset. The model obtained a mean Intersection over Union (mIoU) of 94.10%, a mean Pixel Accuracy (mPA) of 97.44%, an F1-score of 96.91%, a precision of 96.40%, and a recall of 97.45%. Compared with the original DeepLabv3+, the performance is improved by 3.70, 2.79, 2.09, 3.05, and 1.00 percentage points respectively. The total model size is reduced to 5.47 MB, which is 75.6% and 97.4% smaller than the MobileNetV2 and Xception backbones, respectively, making it suitable for deployment on embedded and resource-constrained platforms. Comparative analysis with other mainstream semantic segmentation models such as UNet, SegNet, BiSeNet, and PSPNet indicated that the proposed model exhibited superior performance in accurately segmenting small lodging areas and delineating complex boundaries.To further validate the model’s robustness, additional tests were conducted under low-light nighttime conditions and dusty environments. The model consistently achieved lodging detection pixel errors below 1.50%, confirming its reliability under challenging visual conditions. Finally, the segmentation model was integrated into an online detection system developed with PyQt5 and a vehicle-mounted camera interface. During real-world operation, the system achieved an average inference speed of 14.23 frames per second, with relative pixel error maintained below 1%. These results confirmed that the proposed approach combined high segmentation accuracy, strong robustness, and real-time performance, offering an effective technical solution for intelligent wheat harvesting applications in precision agriculture.