Abstract:
Aiming at the problem that the existing methods of crop disease leaf detection were not accurate enough to locate the leaf disease region by using image features, a new method of crop disease leaf detection based on multi-modal feature alignment was proposed. During the training phase, image and text from a collection of crop leaves were first encoded using visual and text encoders. The diseased areas in a given image were located according to the visual encoding features, and the integration of visual and text encoding features was used to achieve fine-grained classification of the type of disease in the diseased area. In the inference phase, the pretrained disease area localization module was used to locate the diseased areas in a given test image, and the extracted diseased areas were used as input for a pretrained classification model. Finally, by calculating the similarity between the predicted text values and the original labels in the text set, a rapid fine-grained classification result for the diseased area was obtained. Tests on several open-source crop disease datasets show that the proposed method can achieve high precision rates of 0.957 4, 0.961 1, 0.958 0, and 0.950 2 on potato, tomato, apple, and strawberry datasets, respectively. It has better comprehensive perfor mance and good paratical application value.