基于BERT-Attention-DenseBiGRU的农业问答社区问句相似度匹配
Densely Connected BiGRU Neural Network Based on BERT and Attention Mechanism for Chinese Agriculture-related Question Similarity Matching
-
摘要: 为了解决问答社区中相同语义问句文本的快速自动检测,提出一种基于BERT的Attention-DenseBiGRU农业问句相似度匹配模型。针对农业文本具备的特征,采用12层的中文BERT文本预训练模型对文本数据进行向量化处理,并与Word2Vec、Glove、TF-IDF方法进行对比分析,得出BERT方法能够有效地解决农业文本的高维性和稀疏性问题,并且解决多义词在不同语境下具有不同含义的问题。该网络的每一层都使用注意特征的连接信息以及前面所有递归层的隐藏特征,为了缓解由于密集拼接而导致特征向量尺寸不断增大的问题,在模型的最后使用自动编码器进行特征降维。试验结果表明:基于BERT的Attention-DenseBiGRU农业问句相似度匹配模型可以提高文本特征的利用率,减少特征丢失,能够实现快速及准确的农业问句文本相似度匹配,在本文所构建的农业问句相似对数据集上精确率及F1值达到97.2%和97.6%,与其他6种问句相似度匹配模型相比,效果提升明显。Abstract: To allow fast and automatic detection of the same semantic agriculture-related questions, a method based on BERT-Attention-DenseGRU(gated recurrent unit) was proposed. According to the agriculture question characteristics, twelve layers of the Chinese BERT model method were applied to process and analyze the text data and compare it with the Word2 Vec, Glove, and TF-IDF methods, effectively solving the problem of high dimension and sparse data in the agriculture-related text. Each network layer employed the connection information of features and all previous recursive layers’ hidden features. To alleviate the problem of feature vector size increasing due to dense splicing, an autoencoder was used after dense concatenation. The experimental results showed that agriculture-related question similarity matching based on BERT-Attention-DenseBiGRU can improve the utilization of text features, reduce the loss of features, and achieve fast and accurate similarity matching of the agriculture-related question dataset. The precision and F1 values of the proposed model were 97.2% and 97.6%. Compared with six other kinds of question similarity matching models, a state-of-the-art method with the agriculture-related question dataset was presented.