Abstract:
Precise tracking and behavior recognition in the group-housed swine are critical for intelligent livestock farming. However, the object tracking and detection can also present some significant challenges, due to the pigs’ crowding, occlusion, and illumination variation in real farming environments. In this study, a multi-object tracking and behavior analysis framework (named SR-YOLO+OC-SORT) was proposed to integrate an improved YOLOv8n detector with a robust tracking module. Firstly, the original C2f module of the YOLOv8n was replaced with a spatially aware C2f_SGE module. The C2f_SGE was retained for the efficient feature extraction of the C2f module. A spatial enhancement mechanism was introduced to strengthen the spatial information and semantic expression of the feature maps. The noise regions were significantly suppressed to enhance the response of the effective feature areas under uneven lighting and occlusion scenarios. Moreover, a lightweight RepGFPN structure was introduced into the neck to enhance the feature fusion and detection robustness under complex conditions. Secondly, the detection from the SR-YOLO was fed into the OC-SORT tracker. The pig identity was effectively maintained even in the challenging scenarios, such as the severe occlusion, dense groupings, and varying illumination. Finally, a behavior monitoring algorithm was designed to combine the behavior categories with the OC-SORT tracking trajectories. Thereby, the time-based analysis was also performed on four typical pig behaviors (stand, lie, eat, and the other). A series of experiments was conducted on two datasets: a public dataset and a private dataset, in order to validate the effectiveness. The public dataset comprises 10 video segments, with 6 segments for the training and validation, and 4 segments for the testing. The private dataset originated from a commercial pig farming in Foshan City, Guangzhou Province, China. The 12 1-minute video clips are also included with the 8 video sequences for training and 4 for testing. All datasets were captured using fixed overhead cameras with a video resolution of 2 688×1 520 pixels. Each video was recorded and then annotated at a rate of 5 frames per second. The stable documentation of the pigs' behavioral activities was realized within the pens. The key frames were extracted from the raw videos using FFmpeg 6.0, in order to ensure the diversity of behaviors. Different behaviors were precisely annotated using the DarkLabel tool. This dataset also exhibited significant diversity over the multiple dimensions, including the pig body size, stocking density, and housing environments. The behavioral analysis was carried out on the dataset, particularly for the more universally applicable and valuable application. Experimental results showed that the superior performance was achieved in both detection and tracking tasks. In terms of detection, the SR-YOLO achieved 90.1% of mAP (mean average precision at IoU-0.5) and 84.4% of F1-score on the public dataset, and 85.6% with mAP@0.5 and 83.7% with F1-score on the private dataset, thereby outperforming the mainstream detectors, such as the YOLOv5, YOLOv6, and YOLOv10. In the multi-object tracking task, the SR-YOLO+OC-SORT framework also outperformed the classical approaches, such as the ByteTrack and BoT-SORT, which achieved 83.2% with HOTA (higher order tracking accuracy), 94.0% with MOTA (multiple-object tracking accuracy), and 92.0% with IDF1(identity F1-score) on the public dataset. Moreover, the better performance was also obtained by 85.2%, 96.7%, and 96.8% for the HOTA, MOTA, and IDF1 on the private dataset, respectively. Furthermore, the individual pig behaviors were monitored and then analyzed over time, according to the behavior tracking information. The experimental results demonstrate that the detection and tracking framework achieved superior accuracy and robust performance under diverse real-world conditions. The finding can also provide a scalable technical solution for the effective monitoring of pig behavior in intelligent farming.