Cow recognition method in milking scene based on back features
-
Graphical Abstract
-
Abstract
Accurate recognition of dairy cow identity serves as the cornerstone for intelligent farming systems, particularly in milking scenarios where precise and rapid identification significantly enhances management efficiency while ensuring animal welfare. Traditional recognition techniques, such as branding and ear tagging, have been shown to potentially cause harm to the animals and negatively impact their well-being, and have consequently been gradually phased out with the advancement of technological developments. Although Radio Frequency Identification (RFID) technology remains widely adopted, limitations persist regarding device fragility and animal discomfort. In contrast, contactless recognition technology based on computer vision proved to have advantages such as low cost and high efficiency, making them more suitable for application in milking scene. To improve management efficiency and reduce operational costs, a real-time recognition method based on back features of dairy cows was proposed. This method involved capturing images of cows in milking scene using cameras to identify them, thereby reducing the probability of missed and false detections and avoiding harm to animal welfare. Videos of milking operations were collected, which were then decomposed into images, resulting in an object detection dataset containing 793 annotated images. Lightweight architectural improvements to YOLOv8 resulted in the YOLOv8-DW network. Comparative performance evaluations against multiple YOLOv8 variants confirmed the superiority of YOLOv8-DW in target detection and image extraction. The YOLOv8-DW model achieved remarkable results, with a precision of 98.9%, a recall of 96.4%, and a mean average precision of 99.3%, outperforming the different versions of the YOLOv8 network. The implementation of the YOLOv8-DW model in the NVIDIA GeForce RTX 3090 24 GB GPU environment demonstrated its exceptional efficiency, reaching a detection rate of 169.49 frames per second. This significant improvement in inference speed can be attributed to the lightweight nature of the YOLOv8-DW network, which maintains high accuracy while substantially reducing the computational complexity compared to the original YOLOv8 model. Milking video analysis using YOLOv8-DW facilitated the creation of an identity recognition dataset containing 3145 unique cow identities across 52834 images. This dataset was used to train and compare five backbone networks: HRNet, EfficientNet, ConvNeXt, Swin Transformer, and Swin Transformer V2. Five backbone networks underwent comprehensive evaluation through three testing protocols: standard test sets, temporally varied milking videos, and a specialized dataset of 79 solid black back cows. After careful analysis, HRNet network was selected as the most suitable model for the application. The HRNet network demonstrated exceptional performance in extracting the distinctive back features of the target cows and calculating the cosine similarity between the target cows and the cows in the database. On the test set, the HRNet model achieved a mean average precision of 99.76% and Rank@1 accuracy of 100%, while maintaining an impressive image processing speed of 120.12 frames per second. Kernel density estimation established recognition thresholds through analysis of similarity score distributions between correct and incorrect identifications. Multi-frame detection combined with weighted voting based on this threshold achieved final recognition rates of 97.7% for registered cows and 94.0% for unregistered cows. This method achieved real-time identity recognition of dairy cows in milking scenarios, significantly improving management efficiency and reducing farming costs, while maximizing the protection of animal welfare. This provides important technical support for the intelligent farming of dairy cows.
-
-