基于知识增强与注意力机制的双通道图像描述研究

陶云松; 张丽红

基于知识增强与注意力机制的双通道图像描述研究

Research on Two Channel Image Description Based on Knowledge Enhancement and Attention Mechanism

摘要

摘要: 图像描述方法中在信息输入时只将图像作为输入,在端到端训练过程中,内部参数变化难以获取,很有可能造成错误.为进一步减小图像描述的不确定性,在图像描述任务中应用知识增强方法,即在输入端输入图像中的主题信息,将图像描述的范围确定化.提出了一种新的双通道图像描述架构,该架构包括主题通道与图像通道两部分,主题通道提取语义信息,并将其作为主题信息对图像信息进行知识增强;图像通道实现经典图像描述任务功能.两个通道都由极快速区域神经网络进行编码提取特征,采用注意力机制进行特征筛选,由长短期记忆网络进行解码预测信息.最后再使用一个长短期记忆网络综合两个通道的信息,实现主体通道对图像通道的知识增强并生成描述.该方法在数据集Flickr与MS COCO上测试,与一般的图像描述方法相比准确率获得了提高.

Abstract: In the image description method, only the image is taken as the input when the information is input. In the end-to-end training process, the internal parameter transformation is difficult to obtain, which may cause errors. In order to further reduce the uncertainty of image description, knowledge enhancement method is applied in the image description task. Subject informations are input at the input port to define the scope of the image description. A new two-channel image description architecture is proposed. The architecture includes two parts: theme channel and image channel. The theme channel extracts semantic information and uses semantic information as theme information to enhance the knowledge of image information. Image channel is of the classic image description task function. Features are encoded extracted by the fast regional neural network in the two channels, and feature screening is carried out by the attention mechanism, while the prediction information is decoded by the long and short term memory network. Finally, a long and short term memory network is used to synthesize the information of the two channels to enhance the knowledge of the main channel and generate the description of the image channel. This method is tested on the data sets Flickr and MS COCO. Compared with the general image description method, the results show that the accuracy is improved.

HTML全文

参考文献(17)

施引文献

资源附件(0)