Abstract:
With the continuous increase in human demand for sheep products such as meat, milk, and wool, large-scale and intensive sheep farming has become an inevitable trend in modern animal husbandry. In this context, accurately identifying estrous ewes is essential for effective reproductive management. Precise detection of estrus allows farmers to capture the optimal breeding window, thereby improving conception rates and enhancing overall reproductive efficiency. However, manual observation is not only labor-intensive and time-consuming but also prone to human errors caused by subjective judgment, often resulting in missed estrus windows and reduced reproductive efficiency. This issue is particularly pronounced in large-scale sheep farming, where continuous monitoring is essential. To address the challenges of high miss rates, low efficiency, and difficulty in sustained surveillance associated with manual estrus detection, this study proposed an automated detection and tracking method for estrous ewes based on an improved YOLOv11n and OSTrack framework. The developed vision-based solution operated effectively under real-world farm conditions, robustly handling complex scenarios such as high animal density, strong visual similarity among individuals, and frequent occlusions. The proposed framework integrated two key components in a complementary manner. First, standard convolutions in the YOLOv11n network were replaced with space-to-depth convolution (SPDConv). This modification reduced model complexity and computational load while preserving critical spatial information during the downsampling process, which was essential for detecting fine-grained behavioral cues such as mounting postures or tail raising. Unlike traditional pooling or strided convolutions that discard spatial resolution, SPDConv reorganized spatial pixel information into channel dimensions, thereby maintaining structural fidelity without increasing parameters. Second, the Triplet Attention mechanism was incorporated into the Neck of the network to jointly capture dependencies across channel, height, and width dimensions. This enhancement significantly improved the model’s ability to extract discriminative features and distinguish subtle postural differences, particularly in crowded scenes where occlusion and appearance ambiguity are common. By attending to informative regions in three orthogonal views, the network became more sensitive to context and alignment, which was crucial for identifying estrus in visually homogeneous flocks. Finally, the optimized YOLOv11n detector was combined with the OSTrack single object tracker to form the complete YOLO-OSTrack framework. In this design, the detector provided the initial high confidence bounding box for an ‘Estrus’ labeled ewe, which was then used to initialize the tracker. Subsequent frames were processed by OSTrack alone, enabling continuous and efficient localization without repeated detection calls. This detection tracking decoupling strategy not only reduced inference latency but also ensured temporal smoothness and identity consistency over long sequences. Experimental results demonstrated the effectiveness of the approach. The improved YOLOv11n achieved an F1-score of 93.0%, an average precision (AP) of 98.0% for mounting behavior, and 93.4% for estrous ewes, representing gains of 1.1, 0.5, and 2.0 percentage points over the baseline model. The proposed model exhibited a lightweight architecture with only 2.2 M parameters and a model size of 4.5 MB, while entailing a computational cost of 5.6 G. These figures represented reductions of 15.4%, 13.5%, and 12.5%, respectively, compared to the baseline. These improvements made the model highly suitable for deployment on edge devices with limited memory and processing power. In tracking, OSTrack achieved a success rate (area under curve, AUC) of 85.1%, precision (
P) of 87.0%, and norm precision of 96.1%, demonstrating its robustness against challenges such as high-density sheep flocks, pose variations, high visual similarity among individuals, and partial occlusion. Overall, the YOLO-OSTRack framework can accurately detect and stably track estrous ewes over the long term in practical farming environments, offering reliable technical support for real-time monitoring, early warning systems, individualized precision management, and the optimization of reproductive efficiency in modern intelligent livestock systems, thereby contributing to more sustainable and data-driven sheep farming practices.