基于LERT多特征融合的牛只健康养殖命名实体识别方法

付振; 孙伟; 曹姗姗; 孔繁涛

doi:10.11975/j.issn.1002-6819.202509096

基于LERT多特征融合的牛只健康养殖命名实体识别方法

A LERT-based multi-feature fusion approach for named entity recognition in cattle health farming

摘要

摘要: 牛只（肉牛和奶牛）健康养殖知识是牧场智能决策的重要基础，但目前牛只健康养殖领域的中文文本语料匮乏，尤其缺乏高质量的命名实体识别（named entity recognition，NER）语料资源，而NER则存在命名实体类型多样、结构复杂且分布不均衡等问题，严重影响了牛只健康监测、诊断与管理水平的提升。该研究构建了包含17种实体类别且涵盖舍内环境、疫病防控、饲料饮水等多个维度的中文命名实体识别语料库CHNERC，提出了一种基于LERT多特征融合的牛只健康养殖命名实体识别模型L-BISC。在向量表示层，使用LERT作为预训练语言模型，在提升中文语义处理效果的同时，可有效捕捉长文本中的上下文信息；在特征抽取层，采用双向长短期记忆网络（bidirectional long short-term memory，BiLSTM）与空洞卷积神经网络（iterated dilated convolutional neural network，IDCNN）联合抽取特征，增强序列表示能力，进一步融合长距离依赖与局部特征；在特征融合层，引入缩放点积多头注意力机制（scaled dot-product multi-head attention，SDPMHA），加强模型对长距离依赖的建模能力，提升实体边界识别与类别判别的准确性；在解码阶段，通过条件随机场（conditional random field，CRF）对标签序列进行全局优化，保证输出结果的结构合理性。结果表明，L-BISC模型在牛只健康养殖语料库上准确率达到90.45%，召回率为90.76%，F1值达到90.57%，优于主流的BERT、RoBERTa等模型；17种类别实体识别的准确率均超过80%，L-BISC模型能够有效利用文本语义和时序特征；BiLSTM+IDCNN+SDPMHA的多模型特征抽取方式，对L-BISC模型的性能和特征融合能力起到积极作用。该研究丰富了牛只健康养殖领域中文命名实体识别语料库，为牛只健康养殖NER提供了高精度的新方法，可为相关领域的自然语言处理任务提供方法借鉴与参考。

Abstract: Health management knowledge of cattle, encompassing both beef and dairy production systems, represents a fundamental component of intelligent and data-driven livestock farming. It involves multiple dimensions such as housing environment, disease prevention, breeding management, nutrition, and feed and water regulation, all of which are closely related to animal welfare, productivity, and sustainable development. However, research in cattle health management currently faces a severe shortage of high-quality Chinese textual resources, particularly the lack of annotated corpora for named entity recognition (NER), which limits the development of knowledge extraction, intelligent monitoring, and automated decision-making systems in precision livestock farming. NER in this domain presents unique challenges compared with general text, as cattle health data are characterized by highly diverse entity types, complex and nested entity structures, uneven data distributions, and frequent occurrences of domain-specific terminology. These characteristics make general-purpose models such as BERT insufficient for accurate entity recognition due to limited domain adaptation and poor performance in identifying long-tail or low-frequency entities. To address these challenges, this study constructed a comprehensive Chinese NER corpus for cattle health management covering 17 entity categories, including diseases, drugs, feed, physiological indicators, management operations, and environmental factors, and proposed a multi-feature fusion NER model based on the Livestock Enhanced Representation for Text (LERT). At the representation layer, LERT was employed as a pre-trained language model to enhance Chinese semantic comprehension and effectively capture long-range contextual dependencies specific to the cattle domain. At the feature extraction layer, a Bi-directional Long Short-Term Memory (BiLSTM) network and an Iterated Dilated Convolutional Neural Network (IDCNN) were jointly utilized, where BiLSTM modeled long-range dependencies while IDCNN efficiently extracted local features, enabling comprehensive representation learning that integrates global and local context. Furthermore, a Scaled Dot-Product Multi-Head Attention mechanism was introduced at the feature fusion layer to strengthen the perception of long-distance dependencies and improve boundary and category identification, while a Conditional Random Field (CRF) layer was applied at the decoding stage to globally optimize label sequences and ensure structural consistency of the outputs. Experimental evaluations demonstrated that the proposed model achieved excellent performance on the constructed corpus, attaining a precision of 90.45%, recall of 90.76%, and F1-score of 90.57%, outperforming baseline models such as BERT and RoBERTa. All entity categories achieved precision above 80%, indicating strong and stable recognition capability. Ablation experiments further verified that both the multi-head attention mechanism and the combination of BiLSTM with IDCNN contributed significantly to feature fusion and overall performance improvement. This study not only enriches the Chinese NER resources in the field of cattle health management but also provides a high-precision and domain-adaptive approach for entity recognition, offering valuable methodological insights for natural language processing applications in intelligent livestock farming and related agricultural fields.

HTML全文

参考文献(39)

施引文献

资源附件(0)