Abstract:
At present, there are still some problems in the multi-label classification algorithm based on deep learning, such as the relevance between labels needs to be improved, and how to solve the problem that small targets are more difficult to identify than large targets. In this paper, we propose a multi-label image classification algorithm that uses the split attention network ResNeSt for feature extraction and uses a dual-branch Transformer to query class labels. In addition, we use the cross-attention module in Transformer Decoder to extract the local features adaptively. On this basis, in order to enhance the classification effect of the Transformer module, we introduce BatchformerV2 to make the Transformer form a double-branch network.The mAP of the model on the COCO dataset is 88. 4%, and the average precision on the VOC2007 dataset is 96. 0%, which improves the accuracy of multi-label image classification to a certain extent.