细粒度图像识别任务的多层和区域特征融合模型

刘宇泽; 孙涵; 李明洋; 李明心; 康巨涛; 王恩浩

doi:10.13733/j.jcam.issn.2095-5553.2023.01.028

细粒度图像识别任务的多层和区域特征融合模型

Multi-layer and regional feature integration models for fine-grained visual classification

摘要

摘要: 细粒度图像识别任务中，在整体相似度极大而局部细节不同的图片中提取具有关注度的区域，并对其中的特征加以学习是至关重要的任务。针对目前研究中存在的人工标注判别区域的成本太高、模型构建中需引入大量额外的网络结构，在训练和推理阶段会引入额外的计算开销等问题，研究优化后提出多层和区域特征融合模型。模型基于注意力机制进行构建，模拟人类观察原理，提升对有价值的局部细节的关注能力，提高在经典数据集上的识别效果。本模型主要分为带有注意力权重的卷积神经网络多层融合和基于区域特征之间依赖性的区域融合两个部分。整体主要以注意力机制为主，注重特征提取时全面考虑图像细节特征和抽象特征以及对于不同区域的组成与各个区域之间的依赖关系，在兼顾整体的情况下同时发挥局部细节的影响力。试验结果表明：在部分经典数据集上具有良好的准确率，Oxford Flowers数据集准确率为95.69%,同时在AID(航拍图像)数据集上具有96.96%的准确率，此前没有任何模型在该数据集上有过相关研究和模型训练。

Abstract: In fine-grained visual classification tasks, it is very important to extract areas of interest from images with great overall similarity but different local details and to learn their features. In view of the existing problems in the current research, such as the high cost of manual labeling discriminating regions, the need to introduce a large number of additional network structures in the model construction, and the additional computational overhead in the training and reasoning period, the multi-layer and regional feature integration model was proposed after optimization. The model is constructed based on the attention mechanism to simulate the principle of human observation, improve the ability to pay attention to valuable local details, and improve the recognition effect on classical data sets. This model is mainly divided into two parts: the multi-layer fusion of convolutional neural network with attention weight and the region fusion based on the dependence of regional features. The overall focus is mainly on the attention mechanism, which pays attention to the comprehensive consideration of the detail features and abstract features of the image, as well as the dependence between the composition of different regions and each region, and gives play to the influence of local details while taking into account the whole. The experimental results show that it has good accuracy in some classical data sets, such as 95.69% in Oxford Flowers data set and 96.96% in AID（aerial image） data set. No model has been studied and trained on this data set before.

HTML全文

参考文献(38)

施引文献

资源附件(0)