高级检索+

基于改进YOLOv11n与OSTrack的发情母羊检测与跟踪方法

An approach for detecting and tracking estrous ewes based on improved YOLOv11n and OSTrack

  • 摘要: 为解决母羊饲养管理中人工查情法漏检率高、效率低下以及持续监测困难等问题,该研究提出了一种基于改进YOLOv11n与OSTrack的发情母羊自动化检测与跟踪方法。首先,采用空间深度转换卷积(space-to-depth convolution,SPDConv)替代YOLOv11n网络中的标准卷积,在降低模型复杂度的同时保留采样过程中的空间细节信息。其次,将三重注意力(Triplet Attention)机制融入颈部结构中,以增强模型在密集且外观相似羊群中的特征提取与姿态判别能力。最后,将改进YOLOv11n与OSTrack跟踪器相结合,以检测的发情母羊目标框作为跟踪器初始输入,构建YOLO-OSTrack框架,实现对发情母羊的检测与追踪。试验结果表明,在检测性能方面,改进YOLOv11n模型的F1分数达93.0%,爬跨行为平均检测精度为98.0%,发情母羊平均检测精度为93.4%,相较于基线YOLOv11n模型分别提升1.1、0.5和2.0个百分点;该模型参数量为2.2 M,浮点运算量为5.6 G,模型大小为4.5 MB,相较于基线YOLOv11n模型分别降低15.4%、12.5%和13.5%。在跟踪性能方面,OSTrack模型的成功率(area under curve,AUC)为85.1%,精确度(P)为87.0%,归一化精确度为96.1%。该研究提出的YOLO-OSTrack框架实现了生产羊场中发情母羊的精准检测与持续跟踪,可为实时监测预警、个体精准管理、繁殖效率优化等关键环节提供可靠的技术支持。

     

    Abstract: With the continuous increase in human demand for sheep products such as meat, milk, and wool, large-scale and intensive sheep farming has become an inevitable trend in modern animal husbandry. In this context, accurately identifying estrous ewes is essential for effective reproductive management. Precise detection of estrus allows farmers to capture the optimal breeding window, thereby improving conception rates and enhancing overall reproductive efficiency. However, manual observation is not only labor-intensive and time-consuming but also prone to human errors caused by subjective judgment, often resulting in missed estrus windows and reduced reproductive efficiency. This issue is particularly pronounced in large-scale sheep farming, where continuous monitoring is essential. To address the challenges of high miss rates, low efficiency, and difficulty in sustained surveillance associated with manual estrus detection, this study proposed an automated detection and tracking method for estrous ewes based on an improved YOLOv11n and OSTrack framework. The developed vision-based solution operated effectively under real-world farm conditions, robustly handling complex scenarios such as high animal density, strong visual similarity among individuals, and frequent occlusions. The proposed framework integrated two key components in a complementary manner. First, standard convolutions in the YOLOv11n network were replaced with space-to-depth convolution (SPDConv). This modification reduced model complexity and computational load while preserving critical spatial information during the downsampling process, which was essential for detecting fine-grained behavioral cues such as mounting postures or tail raising. Unlike traditional pooling or strided convolutions that discard spatial resolution, SPDConv reorganized spatial pixel information into channel dimensions, thereby maintaining structural fidelity without increasing parameters. Second, the Triplet Attention mechanism was incorporated into the Neck of the network to jointly capture dependencies across channel, height, and width dimensions. This enhancement significantly improved the model’s ability to extract discriminative features and distinguish subtle postural differences, particularly in crowded scenes where occlusion and appearance ambiguity are common. By attending to informative regions in three orthogonal views, the network became more sensitive to context and alignment, which was crucial for identifying estrus in visually homogeneous flocks. Finally, the optimized YOLOv11n detector was combined with the OSTrack single object tracker to form the complete YOLO-OSTrack framework. In this design, the detector provided the initial high confidence bounding box for an ‘Estrus’ labeled ewe, which was then used to initialize the tracker. Subsequent frames were processed by OSTrack alone, enabling continuous and efficient localization without repeated detection calls. This detection tracking decoupling strategy not only reduced inference latency but also ensured temporal smoothness and identity consistency over long sequences. Experimental results demonstrated the effectiveness of the approach. The improved YOLOv11n achieved an F1-score of 93.0%, an average precision (AP) of 98.0% for mounting behavior, and 93.4% for estrous ewes, representing gains of 1.1, 0.5, and 2.0 percentage points over the baseline model. The proposed model exhibited a lightweight architecture with only 2.2 M parameters and a model size of 4.5 MB, while entailing a computational cost of 5.6 G. These figures represented reductions of 15.4%, 13.5%, and 12.5%, respectively, compared to the baseline. These improvements made the model highly suitable for deployment on edge devices with limited memory and processing power. In tracking, OSTrack achieved a success rate (area under curve, AUC) of 85.1%, precision (P) of 87.0%, and norm precision of 96.1%, demonstrating its robustness against challenges such as high-density sheep flocks, pose variations, high visual similarity among individuals, and partial occlusion. Overall, the YOLO-OSTRack framework can accurately detect and stably track estrous ewes over the long term in practical farming environments, offering reliable technical support for real-time monitoring, early warning systems, individualized precision management, and the optimization of reproductive efficiency in modern intelligent livestock systems, thereby contributing to more sustainable and data-driven sheep farming practices.

     

/

返回文章
返回