基于FS-SWAGCN的羊只围产期行为视频识别方法

孙思晗; 孙小华; 王超; 何振学; 袁万哲; 雷白时; 王福顺

doi:10.11975/j.issn.1002-6819.202509106

基于FS-SWAGCN的羊只围产期行为视频识别方法

Video-based recognition method for perinatal behaviors of ewes using FS-SWAGCN

摘要

摘要: 羊只围产期行为的精准监测对于提升繁殖效率、降低难产风险及优化牧场管理具有重要意义。针对围产期行为视频存在长时序、低动态变化及部分行为特征高度相似等问题，提出了一种基于改进双流自适应图卷积网络（two-stream adaptive graph convolutional network, 2s-AGCN）的多模态骨架行为识别方法。首先，针对骨架拓扑结构在表征复杂行为时动态性不足的问题，引入加权自适应图卷积层（weighted adaptive graph convolution layer, W-AGCL），以自适应方式动态调整节点间连接权重，从而增强模型对空间结构变化的敏感性与鲁棒性。其次，针对围产期行为中存在的微动作特征及时空分布不均匀的问题，构建了时空增强注意力模块（spatial temporal enhanced attention module, STE），强化对关键时空区域的感知与判别。最后，设计四流图卷积网络结构，融合关节流、骨骼流、关节运动流与骨骼运动流四种模态信息，实现多模态时空特征的联合建模与深度交互。试验结果表明，在自建羊只围产期视频数据集上，改进模型FS-SWAGCN的Top-1准确率达到86.21%，相较于ST-GCN、ST-GCN++、PoseC3D、Shift-GCN、CTR-GCN及2s-AGCN，Top-1分别提高7.81、8.41、7.95、7.26、7.11及6.89个百分点；模型参数量为5.70 M，单帧推理延迟控制在15.5ms，能够满足牧场实时监测需求。研究结果验证了基于图卷积的骨架行为识别方法在该领域的优势，为智慧牧场羊只分娩预警提供高效可行的技术支撑。

Abstract: The precise monitoring of perinatal behaviors in ewes is crucial for improving reproductive efficiency, reducing the risk of dystocia, and optimizing farm management in intelligent livestock systems. However, the automatic recognition of perinatal behaviors from video data remains a major challenge due to the long temporal duration, subtle motion patterns, and high inter-class similarity among behavioral features. To address these issues, this study aimed to develop a robust and efficient multi-modal skeleton-based behavior recognition framework using an improved Two-Stream Adaptive Graph Convolutional Network (2s-AGCN). In this work, we designed a novel graph-based deep learning model capable of capturing both spatial and temporal dependencies in complex ewe behaviors. First, to overcome the limited adaptability of fixed skeletal topology in representing subtle or dynamic motion changes, a Weighted Adaptive Graph Convolution Layer (W-AGCL) was introduced. This layer dynamically adjusted the connection strengths between skeletal nodes, enabling the model to adaptively learn the most informative spatial relationships according to the behavior context. The adaptive weighting mechanism enhanced the model’s sensitivity to spatial structure variations and improved its robustness to noise and individual differences among ewes. Second, to address the problem of uneven temporal distribution and the presence of micro-movements in perinatal behaviors, we developed a Spatial Temporal Enhanced Attention Module (STE). This module selectively emphasized critical spatiotemporal regions by assigning higher attention weights to frames and joints with discriminative information, thereby improving the network’s ability to capture subtle but behaviorally significant cues. Furthermore, a four-stream graph convolutional architecture was proposed to perform comprehensive multi-modal feature fusion. The model simultaneously processed four complementary modalities: the Joint Stream representing skeletal joint coordinates, the Bone Stream capturing limb connectivity and orientation, the Joint Motion Stream describing temporal displacement of joints, and the Bone Motion Stream modeling dynamic variations in bone vectors. Through joint feature learning and deep interaction among these four streams, the network achieved an integrated understanding of both static posture configurations and dynamic motion trends. Experiments were conducted on a self-built ewe perinatal behavior video dataset, which included annotated samples of typical behaviors such as standing, lying, turning, nest-building, and lambing. The proposed improved 2s-AGCN model achieved a Top-1 classification accuracy of 86.21%, outperforming several state-of-the-art skeleton-based action recognition models, including ST-GCN, ST-GCN++, PoseC3D, Shift-GCN, CTR-GCN, and the original 2s-AGCN. Specifically, the performance gains in Top-1 accuracy over these models were 7.81%, 8.41%, 7.95%, 7.26%, 7.11%, and 6.89%, respectively. Despite its enhanced performance, the proposed model maintained a compact architecture with 5.70 million parameters, and the average inference latency per frame was 15.5 milliseconds, demonstrating that it was fully capable of supporting real-time monitoring applications in farm environments. The findings confirm that skeleton-based graph convolutional models hold significant advantages in recognizing fine-grained animal behaviors during the perinatal period. The proposed improved 2s-AGCN framework effectively balances accuracy, efficiency, and real-time inference performance. By adaptively learning spatiotemporal dependencies and integrating multi-modal skeletal information, it provides a powerful tool for automatic behavior understanding in intelligent sheep farming systems. This study demonstrates the feasibility of using deep graph learning for precision livestock monitoring and establishes a solid foundation for future applications such as early lambing prediction, automated reproductive management, and precision welfare assessment in smart pastures.

HTML全文

参考文献(30)

施引文献

资源附件(0)