Abstract:
Precise identification of the individual cow can serve as one of the most fundamental prerequisites for the downstream tasks in smart animal husbandry, including precision feeding, health monitoring, and accurate estrus detection. The conventional contact identification—such as the electronic ear tags and sensory collars—has been widely adopted in recent years. Nevertheless, their application can be limited to the high maintenance costs, susceptibility to damage, and the stress response to animal welfare. Fortunately, the non-contact computer vision has emerged as a promising alternative. However, the current mainstream vision models have shared significant challenges in real-world breeding environments, such as the variable illumination in barns, diverse cow postures during movement, and the fine-grained nature of cow face features. It is often required to balance the high recognition accuracy and lightweight architecture. In this study, a lightweight cow face identification model (named LCFI-Net, Lightweight Cow Face Identification Network) was proposed using an improved DenseNet121 architecture. A three-stage structural optimization was carried out to balance between model performance and computational efficiency. Firstly, the backbone of the standard DenseNet121 was structurally pruned and then streamlined to create a DenseNet_Lite module. The redundant parameters were effectively removed to retain the essential feature extraction. Secondly, a Multi-Scale Attention Dense Layer (MSAD-Layer) was introduced to replace the standard dense blocks. A synergistic combination of the multi-scale feature fusion and attention mechanisms was enhanced to perceive the key fine-grained features—such as the specific facial patterns—against the complex, cluttered backgrounds. Thirdly, an Inverted Bottleneck Transition Layer (IBT-Layer) was utilized to further optimize the transmission of the information over the layers. Efficient dimensionality reduction and down-sampling were realized to preserve the integrity of the image feature information, thereby preventing the loss of critical details during the feature map compression. A high-quality dataset was collected from the visible light images of the cow faces in natural and complex breeding environments. The improved model was trained and then evaluated within a metric learning framework. Experimental results demonstrate that the superior performance of the architecture was achieved after optimization. On the test set, the LCFI-Net achieved a recognition accuracy of 93.54%, which was improved by 2.04 percentage points over the baseline DenseNet121 model. More significantly, the computational cost was substantially reduced in the LCFI-Net. Among them, the parameter of the LCFI-Net was only 1.02 M, which was reduced by 6.07 M, compared with the original DenseNet121. Furthermore, the comparative experiments were performed on the rest of the mainstream lightweight and heavy-duty models in order to validate the robustness of the LCFI-Net. The accuracy of the LCFI-Net was improved by 4.50, 4.46, 4.08, 2.75, and 2.29 percentage points, respectively, compared with the MobileNetv2, ShuffleNetv2, MobileFaceNet, ResNet50, and ResNet18. The LCFI-Net was introduced into the optimal structure for high accuracy and speed. In conclusion, the LCFI-Net was achieved in an optimal equilibrium between recognition precision and computational efficiency. Consequently, the finding can provide a robust technical foundation to deploy the high-precision cow identity recognition on the resource-constrained edge devices, such as the mobile inspection robots and intelligent barn equipment.