Abstract:
Health management knowledge of cattle, encompassing both beef and dairy production systems, represents a fundamental component of intelligent and data-driven livestock farming. It involves multiple dimensions such as housing environment, disease prevention, breeding management, nutrition, and feed and water regulation, all of which are closely related to animal welfare, productivity, and sustainable development. However, research in cattle health management currently faces a severe shortage of high-quality Chinese textual resources, particularly the lack of annotated corpora for named entity recognition (NER), which limits the development of knowledge extraction, intelligent monitoring, and automated decision-making systems in precision livestock farming. NER in this domain presents unique challenges compared with general text, as cattle health data are characterized by highly diverse entity types, complex and nested entity structures, uneven data distributions, and frequent occurrences of domain-specific terminology. These characteristics make general-purpose models such as BERT insufficient for accurate entity recognition due to limited domain adaptation and poor performance in identifying long-tail or low-frequency entities. To address these challenges, this study constructed a comprehensive Chinese NER corpus for cattle health management covering 17 entity categories, including diseases, drugs, feed, physiological indicators, management operations, and environmental factors, and proposed a multi-feature fusion NER model based on the Livestock Enhanced Representation for Text (LERT). At the representation layer, LERT was employed as a pre-trained language model to enhance Chinese semantic comprehension and effectively capture long-range contextual dependencies specific to the cattle domain. At the feature extraction layer, a Bi-directional Long Short-Term Memory (BiLSTM) network and an Iterated Dilated Convolutional Neural Network (IDCNN) were jointly utilized, where BiLSTM modeled long-range dependencies while IDCNN efficiently extracted local features, enabling comprehensive representation learning that integrates global and local context. Furthermore, a Scaled Dot-Product Multi-Head Attention mechanism was introduced at the feature fusion layer to strengthen the perception of long-distance dependencies and improve boundary and category identification, while a Conditional Random Field (CRF) layer was applied at the decoding stage to globally optimize label sequences and ensure structural consistency of the outputs. Experimental evaluations demonstrated that the proposed model achieved excellent performance on the constructed corpus, attaining a precision of 90.45%, recall of 90.76%, and F1-score of 90.57%, outperforming baseline models such as BERT and RoBERTa. All entity categories achieved precision above 80%, indicating strong and stable recognition capability. Ablation experiments further verified that both the multi-head attention mechanism and the combination of BiLSTM with IDCNN contributed significantly to feature fusion and overall performance improvement. This study not only enriches the Chinese NER resources in the field of cattle health management but also provides a high-precision and domain-adaptive approach for entity recognition, offering valuable methodological insights for natural language processing applications in intelligent livestock farming and related agricultural fields.