Abstract:
The precise monitoring of perinatal behaviors in ewes is crucial for improving reproductive efficiency, reducing the risk of dystocia, and optimizing farm management in intelligent livestock systems. However, the automatic recognition of perinatal behaviors from video data remains a major challenge due to the long temporal duration, subtle motion patterns, and high inter-class similarity among behavioral features. To address these issues, this study aimed to develop a robust and efficient multi-modal skeleton-based behavior recognition framework using an improved Two-Stream Adaptive Graph Convolutional Network (2s-AGCN). In this work, we designed a novel graph-based deep learning model capable of capturing both spatial and temporal dependencies in complex ewe behaviors. First, to overcome the limited adaptability of fixed skeletal topology in representing subtle or dynamic motion changes, a Weighted Adaptive Graph Convolution Layer (W-AGCL) was introduced. This layer dynamically adjusted the connection strengths between skeletal nodes, enabling the model to adaptively learn the most informative spatial relationships according to the behavior context. The adaptive weighting mechanism enhanced the model’s sensitivity to spatial structure variations and improved its robustness to noise and individual differences among ewes. Second, to address the problem of uneven temporal distribution and the presence of micro-movements in perinatal behaviors, we developed a Spatial Temporal Enhanced Attention Module (STE). This module selectively emphasized critical spatiotemporal regions by assigning higher attention weights to frames and joints with discriminative information, thereby improving the network’s ability to capture subtle but behaviorally significant cues. Furthermore, a four-stream graph convolutional architecture was proposed to perform comprehensive multi-modal feature fusion. The model simultaneously processed four complementary modalities: the Joint Stream representing skeletal joint coordinates, the Bone Stream capturing limb connectivity and orientation, the Joint Motion Stream describing temporal displacement of joints, and the Bone Motion Stream modeling dynamic variations in bone vectors. Through joint feature learning and deep interaction among these four streams, the network achieved an integrated understanding of both static posture configurations and dynamic motion trends. Experiments were conducted on a self-built ewe perinatal behavior video dataset, which included annotated samples of typical behaviors such as standing, lying, turning, nest-building, and lambing. The proposed improved 2s-AGCN model achieved a Top-1 classification accuracy of 86.21%, outperforming several state-of-the-art skeleton-based action recognition models, including ST-GCN, ST-GCN++, PoseC3D, Shift-GCN, CTR-GCN, and the original 2s-AGCN. Specifically, the performance gains in Top-1 accuracy over these models were 7.81%, 8.41%, 7.95%, 7.26%, 7.11%, and 6.89%, respectively. Despite its enhanced performance, the proposed model maintained a compact architecture with 5.70 million parameters, and the average inference latency per frame was 15.5 milliseconds, demonstrating that it was fully capable of supporting real-time monitoring applications in farm environments. The findings confirm that skeleton-based graph convolutional models hold significant advantages in recognizing fine-grained animal behaviors during the perinatal period. The proposed improved 2s-AGCN framework effectively balances accuracy, efficiency, and real-time inference performance. By adaptively learning spatiotemporal dependencies and integrating multi-modal skeletal information, it provides a powerful tool for automatic behavior understanding in intelligent sheep farming systems. This study demonstrates the feasibility of using deep graph learning for precision livestock monitoring and establishes a solid foundation for future applications such as early lambing prediction, automated reproductive management, and precision welfare assessment in smart pastures.