高级检索+

基于LERT多特征融合的牛只健康养殖命名实体识别方法

LERT-based multi-feature fusion approach for named entity recognition in cattle health farming

  • 摘要: 为了解决牛只健康养殖领域中文语料匮乏及命名实体识别难题,该研究构建了包含17种实体类别且涵盖舍内环境、疫病防控、饲料饮水等多个维度的中文命名实体识别语料库CHNERC,提出了一种基于LERT多特征融合的牛只健康养殖命名实体识别模型L-BISC。在向量表示层,使用LERT作为预训练语言模型,在提升中文语义处理效果的同时,可有效捕捉长文本中的上下文信息;在特征抽取层,采用双向长短期记忆网络(bidirectional long short-term memory,BiLSTM)与空洞卷积神经网络(iterated dilated convolutional neural network,IDCNN)联合抽取特征,增强序列表示能力,进一步融合长距离依赖与局部特征;在特征融合层,引入缩放点积多头注意力机制(scaled dot-product multi-head attention,SDPMHA),加强模型对长距离依赖的建模能力,提升实体边界识别与类别判别的准确性;在解码阶段,通过条件随机场(conditional random field,CRF)对标签序列进行全局优化,保证输出结果的结构合理性。结果表明,L-BISC模型在牛只健康养殖语料库上准确率达到90.45%,召回率为90.76%,F1值达到90.57%,优于主流的BERT、RoBERTa等模型;17种类别实体识别的准确率均超过80%,L-BISC模型能够有效利用文本语义和时序特征;BiLSTM+IDCNN+SDPMHA的多模型特征抽取方式,对L-BISC模型的性能和特征融合能力起到积极作用。该研究丰富了牛只健康养殖领域中文命名实体识别语料库,为牛只健康养殖NER提供了高精度的方法,可为相关领域的自然语言处理任务提供方法借鉴与参考。

     

    Abstract: Health knowledge of the cattle (both beef and dairy production) can represent one of the most important components in the intelligent and data-driven livestock farming. Multiple dimensions can be involved, such as the housing environment, disease prevention, breeding, nutrition, as well as the feed and water regulation, all of which are closely related to animal welfare and productivity in sustainable production. However, it is still lacking in high-quality Chinese textual resources in cattle health research. Particularly, the annotated corpora for the named entity recognition (NER) have limited the knowledge extraction, intelligent monitoring, and decision making in precision livestock farming. Compared with the general text, the NER of the cattle health data is characterized by a highly diverse entity type, complex and nested entity structure, uneven data distribution, and frequent occurrence of the domain-specific terminology. The general-purpose models, such as BERT, cannot fully meet the requirement of accurate entity recognition. It is often required for domain adaptation and high performance in order to identify the long-tail or low-frequency entities. In this study, a Chinese NER corpus was constructed for cattle health. The dataset also covered 17 entity categories, including diseases, drugs, feed, physiological indicators, operations, and environmental factors. A multi-feature fusion NER model was proposed using the Livestock Enhanced Representation for Text (LERT). At the representation layer, the LERT was employed as a pre-trained language model to enhance the Chinese semantic comprehension and effectively capture the long-range contextual dependencies specific to the cattle domain. At the feature extraction layer, a Bi-directional Long Short-Term Memory (BiLSTM) network and an Iterated Dilated Convolutional Neural Network (IDCNN) were utilized to integrate the global and local context during representation learning, where the BiLSTM was used for the long-range dependencies, while the IDCNN was used to efficiently extract the local features. Furthermore, a Scaled Dot-Product Multi-Head Attention mechanism was introduced at the feature fusion layer to strengthen the perception of the long-distance dependencies for the boundary and category identification, while a Conditional Random Field (CRF) layer was applied at the decoding stage to globally optimize the label sequences for the structural consistency of the outputs. Experimental evaluations demonstrated that the model achieved excellent performance on the corpus, with a precision of 90.45%, recall of 90.76%, and F1-score of 90.57%, outperforming baseline models, such as BERT and RoBERTa. All entity categories were achieved with a precision above 80%, indicating the strong and stable recognition. Ablation experiments verified that both the multi-head attention mechanism and the combination of BiLSTM with IDCNN contributed significantly to the feature fusion and overall performance. A high-precision and domain-adaptive approach can provide for the entity recognition of the Chinese NER resources in the field of cattle health. The valuable insights can also be offered for natural language processing in intelligent livestock farming.

     

/

返回文章
返回