Abstract:
Precise tracking and behavior recognition in group-housed swine are critical for intelligent livestock farming. However, in real farming environments, object detection and tracking tasks present significant challenges due to pigs’ crowding, occlusion and illumination variation. To address these issues, this study proposed a multi-object tracking and behavior analysis framework named SR-YOLO+OC-SORT, which integrated an improved YOLOv8n detector with a robust tracking module. Firstly, the original C2f module of YOLOv8n was replaced with a spatially aware C2f_SGE module. C2f_SGE retained the efficient feature extraction capabilities of the C2f module while introducing a spatial enhancement mechanism. This mechanism strengthened the spatial information and semantic expression of feature maps, which significantly suppressed noise regions and enhanced the response of effective feature areas under uneven lighting and occlusion scenarios. Moreover, a lightweight RepGFPN structure was introduced into the neck to enhance feature fusion and detection robustness under complex conditions. Secondly, the detection results from SR-YOLO were fed into the OC-SORT tracker, which effectively maintained pig identity consistency even in challenging scenarios involving severe occlusion, dense groupings and varying illumination. Finally, an automatic behavior monitoring algorithm was designed by combining behavior categories with the OC-SORT tracking trajectories, enabling the time-based analysis of four typical pig behaviors (stand, lie, eat and other). To validate the effectiveness of the proposed method, experiments were conducted on two datasets: a public dataset and a private dataset. The public dataset comprised 10 video segments, with 6 segments used for training and validation, and 4 segments for testing. The private dataset originated from a commercial pig farming in Foshan City and consisted of 12 1-minute video clips. Among these, 8 video sequences were used for training and 4 for testing. All datasets were captured using fixed overhead cameras with a video resolution of
2688×
1520 pixels. Each video was recorded and annotated at a rate of 5 frames per second, enabling stable documentation of pigs' behavioral activities within the pens. To ensure the diversity of behaviors, key frames were extracted from raw videos using FFmpeg 6.0, and different behaviors were precisely annotated using the DarkLabel tool. This dataset exhibited significant diversity across multiple dimensions, including pig body size, stocking density, and housing environments. This diversity made behavioral analyses based on this dataset more universally applicable and valuable for validation, thereby effectively supporting intelligent management in practical farming operations. Experimental results showed that the proposed method achieved superior performance in both detection and tracking tasks. In terms of detection, SR-YOLO achieved 90.1% of mAP (mean average precision at IoU-0.5) and 84.4% of F1-score on the public dataset, and 85.6% with mAP@0.5 and 83.7% with F1-score on the private dataset, outperforming mainstream detectors such as YOLOv5, YOLOv6, and YOLOv10. For multi-object tracking task, the SR-YOLO+OC-SORT framework outperformed classical approaches such as ByteTrack and BoT-SORT, which achieved 83.2% with HOTA (higher order tracking accuracy), 94.0% with MOTA (multiple object tracking accuracy), and 92.0% with IDF1(identity F1-score) on the public dataset. Moreover, it also obtained 85.2%, 96.7%, and 96.8% for HOTA, MOTA, and IDF1 on the private dataset, respectively. Furthermore, based on the behavior tracking information, individual pig behaviors were monitored and analyzed over time. The experimental results demonstrate that the proposed detection and tracking framework achieve superior accuracy and robust performance under diverse real-world conditions, which provides a scalable technical solution for the automatic monitoring of pig behavior in intelligent farming systems.