Abstract:
Early identification is often required for the effective prevention and control of the grape leaf diseases. However, an accurate segmentation is limited to the varying sizes and diverse shapes of the grape leaves and their diseased areas, as well as the complex backgrounds and edge blurriness that caused by lighting interference. Moreover, the existing models can be improved the performance at the cost of the increasing model size and computational complexity. It is also demand for their effective deployment on the resource-constrained mobile devices. In this study, a multi-scale cross-fusion and boundary-aware segmentation network (MCBNet) was proposed to detect the grape leaf diseases, in order to reduce the computational costs for the high segmentation accuracy. A multi-scale cross-fusion decoder was also developed to effectively integrate the feature maps from the different scales. Multi-scale strip convolutional kernels and a cross-axis attention mechanism were utilized to capture the multi-scale global features. Additionally, a boundary-aware guidance module was introduced for the model sensitive to the boundary features. As such, the segmentation performance was enhanced on the edge-blurred diseases of the varying sizes. The experimental results show that: 1) The MCBNet exhibited the outstanding performance on the dataset of the grape leaf diseases. Specifically, in the leaf segmentation task, the MCBNet was improved Dice and IoU metrics by 0.6 and 1.1 percentage points, respectively, compared with the second-best network. In the disease segmentation task, the Dice and IoU metrics were enhanced by 1.3 and 1.9 percentage points, respectively. The HD metric was utilized to measure the accuracy of the segmentation boundary. The MCBNet outperformed the second-best network by 4.0 and 0.4 percentage points in the leaf and disease segmentation tasks, respectively. Additionally, the MCBNet was improved Dice and HD metrics by 1.3 and 1.6 percentage points, respectively, compared with the lightweight MetaSeg network. The better performance was achieved in the parameter counting of only 3.75M and a computational cost of 1.61 GFLOPs. There was the excellent balance between high segmentation accuracy and low computational cost. 2) The public PlantVillage dataset was further validated the generalization of the MCBNet. In the disease segmentation task, the MCBNet was improved Dice, IoU, Se, and Pre metrics to 85.2%, 74.2%, 83.8%, and 86.5%, respectively, compared with the second-best network. Furthermore, the MCBNet outperformed the second-best network by 4.38 percentage points in the HD metric, indicating the better performance on the blurred boundaries. 3) Visualization results also confirmed that the MCBNet was utilized to capture the disease regions of the various sizes in both self-built dataset and public datasets, significantly reducing the missed detections. Moreover, the boundary-aware guidance module of the MCBNet was greatly enhanced to process the edge details, fully validating its exceptional segmentation performance. In conclusion, the MCBNet can be expected to offer an efficient and precise solution for the grape leaf disease segmentation under the complex environments. Its lightweight design can be deployed on the resource-constrained devices. Some limitations were still remained to balance the operational efficiency and deployment requirements. A lightweight backbone network was used for the feature extraction. However, the lightweight backbone network can limit the feature extraction for the segmentation accuracy of the disease regions. Future research can be utilized to optimize the network structure for the more precise capture of the disease regions. Additionally, the model training can still rely on a large amount of the high-quality pixel-level labeled data, which is time-consuming and costly. Therefore, the weakly supervised or semi-supervised learning can be introduced to reduce the reliance on the fine-grained annotations and lower data preparation costs. Finally, the domain adaptation can be added to enhance the stability and generalization under the variable and complex environments, such as the strong lighting or partial leaf occlusion.