Abstract:
In fine-grained visual classification tasks, it is very important to extract areas of interest from images with great overall similarity but different local details and to learn their features. In view of the existing problems in the current research, such as the high cost of manual labeling discriminating regions, the need to introduce a large number of additional network structures in the model construction, and the additional computational overhead in the training and reasoning period, the multi-layer and regional feature integration model was proposed after optimization. The model is constructed based on the attention mechanism to simulate the principle of human observation, improve the ability to pay attention to valuable local details, and improve the recognition effect on classical data sets. This model is mainly divided into two parts: the multi-layer fusion of convolutional neural network with attention weight and the region fusion based on the dependence of regional features. The overall focus is mainly on the attention mechanism, which pays attention to the comprehensive consideration of the detail features and abstract features of the image, as well as the dependence between the composition of different regions and each region, and gives play to the influence of local details while taking into account the whole. The experimental results show that it has good accuracy in some classical data sets, such as 95.69% in Oxford Flowers data set and 96.96% in AID(aerial image) data set. No model has been studied and trained on this data set before.