高级检索+

基于注意力机制的农业文本命名实体识别

Named Entity Recognition of Chinese Agricultural Text Based on Attention Mechanism

  • 摘要: 针对农业智能问答系统构建过程中传统的农业命名实体识别方法依赖人工特征模板、特征信息提取不充分、实体名称多样导致标注不一致等问题,提出一种基于注意力机制的农业文本命名实体识别方法。采用连续词袋模型(Continuous bag of words,CBOW)对输入字向量进行预训练,丰富字向量特征信息,缓解分词准确度对性能的影响;引入文档级的注意力(Attention)机制获取实体间相似信息,保证实体在不同语境下的标签一致性;基于双向长短期记忆网络(Bi-directional long-short term memory,BiLSTM)和条件随机场(Conditional random field,CRF)模型构建适合农业领域实体识别的模型框架。选取4 604篇农业文本,针对病害、虫害、农药、农作物品种4类实体进行了识别实验。结果表明,模型能有效地辨别农业文本中的实体,缓解实体标记不一致的问题,在农业语料上达到了较好的结果,识别的准确率、召回率、F值分别为93.48%、90.60%、92.01%。与其他3种识别方法相比,模型在不同规模语料库的准确率均有一定提高具有明显的性能优势。

     

    Abstract: Agricultural named entity recognition is a fundamental tasks for natural language processing in the agricultural field.More importantly,it is the key basic step of constructing agricultural knowledge graph and intelligent question answering system.Traditional named entity recognition(NER) methods based on CRF model which relies on large amounts of hand-crafted features,cannot extract more effective features and solve the inconsistency of entity tagging caused by the diversity of entity names.To issue the above problems,an Att-BiLSTM-CRF framework was proposed based on deep learning.Firstly,the CBOW model was used to p re-train character embedding on a large number of unlabeled agricultural corpora,and alleviate the impact of segmentation accuracy on the performance of the model.Then,the document-level attention mechanism was introduced to obtain the similar information between entities in the text,so as to ensure the consistency of entity tagging in different contexts.Finally,based on BiLSTM-CRF benchmark model,a model framework suitable for agricultural named entity recognition was constructed.Totally 4 604 agricultural texts were chosen to identify diseases,pests,pesticides and crop varieties.The experimental results showed that the model can effectively identify the entities in the agricultural text and alleviate the problem of inconsistent entity tagging.The model achieved good result in the agricultural corpus,and the recognition precision,recall,and F-score were respectively 93.48%,90.60% and 92.01%.Compared with other models,such as LSTM model,LSTM-CRF model and BiLSTM-CRF model,Att-BiLSTM-CRF had obvious advantages in different size corpus,and it can effectively identify entities for agricultural texts.

     

/

返回文章
返回