高级检索+

基于多语义特征的农业短文本匹配技术

Agricultural Short Text Matching Technology Based on Multi-semantic Features

  • 摘要: “中国农技推广APP”农业问答社区存在提问数据量大、规范性差、涉及面广、噪声多、特征稀疏等影响文本语义匹配的问题,为了改善农业提问数据相似性判断的性能,提出了融合多语义特征的文本匹配模型Co_BiLSTM_CNN,从深度语义、词语共现、最大匹配度3个层面提取短文本特征,并利用共享参数的孪生网络结构,分别运用双向长短期记忆网络、卷积神经网络和密集连接网络构建文本匹配模型。试验结果表明,该模型可以更全面提取文本特征,文本相似性判断的正确率达94.15%,与其他6种模型相比,文本匹配效果优势明显。

     

    Abstract: With the development of information technology, agricultural information consultant service based on mobile Internet has become an important part of agro-technical extension system. More than ten million questions in all have been collected by agro-technical extension Q&A community. With the continuous popularization of Q&A community, answering questions manually only by agricultural experts and technicians can neither follow the rapid growth of the questions nor meet the needs of farmers who want to be answered quickly and accurately. Agricultural intelligent Q&A is one of the effective ways to solve the problem. High quality text matching for new questions is the key technology. The accuracy of text matching is limited by the characteristics of agricultural text, such as large amount of data, poor standardization, wide range, much noise, and sparse features. In order to improve the accuracy, the deep semantics, word co-occurrence and maximum matching degree of agricultural short text were extracted and Co_BiLSTM_CNN model composed of bi-long short-term memory, convolutional neural networks, dense networks and Siamese network of shared parameters, was proposed to extract multi-semantic features. The precision, recall, F1, accuracy and time complexity were selected as evaluation indexes to comprehensively measure the performance of the model. The experimental results showed that the model could extract text features more comprehensively, with an accuracy of 94.15%. Compared with the other six text matching models, the experimental results showed obvious advantages.

     

/

返回文章
返回