基于FS-SWAGCN的羊只围产期行为视频识别方法

孙思晗; 孙小华; 王超; 何振学; 袁万哲; 雷白时; 王福顺

doi:10.11975/j.issn.1002-6819.202509106

基于FS-SWAGCN的羊只围产期行为视频识别方法

Video-based recognition method for perinatal behaviors of ewes using FS-SWAGCN

摘要

摘要: 羊只围产期行为的精准监测对于提升繁殖效率、降低难产风险及优化牧场管理具有重要意义。针对围产期行为视频存在长时序、低动态变化及部分行为特征高度相似等问题，提出了一种基于改进双流自适应图卷积网络（two-stream adaptive graph convolutional network, 2s-AGCN）的多模态骨架行为识别方法。首先，针对骨架拓扑结构在表征复杂行为时动态性不足的问题，引入加权自适应图卷积层（weighted adaptive graph convolution layer, W-AGCL），以自适应方式动态调整节点间连接权重，从而增强模型对空间结构变化的敏感性与鲁棒性。其次，针对围产期行为中存在的微动作特征及时空分布不均匀的问题，构建了时空增强注意力模块（spatial temporal enhanced attention module, STE），强化对关键时空区域的感知与判别。最后，设计四流图卷积网络结构，融合关节流、骨骼流、关节运动流与骨骼运动流四种模态信息，实现多模态时空特征的联合建模与深度交互。试验结果表明，在自建羊只围产期视频数据集上，改进模型FS-SWAGCN的Top-1准确率达到86.21%，相较于ST-GCN、ST-GCN++、PoseC3D、Shift-GCN、CTR-GCN及2s-AGCN，Top-1分别提高7.81、8.41、7.95、7.26、7.11及6.89个百分点；模型参数量为5.70 M，单帧推理延迟控制在15.5 ms，能够满足牧场实时监测需求。研究结果验证了基于图卷积的骨架行为识别方法在该领域的优势，可为智慧牧场羊只分娩预警提供高效可行的技术支撑。

Abstract: Precise monitoring of perinatal behaviors in ewes is crucial to improve reproductive efficiency with the low risk of dystocia in an intelligent livestock system. However, the current recognition of perinatal behaviors from video data has remained a major challenge, due to the long temporal duration, subtle motion patterns, and high inter-class similarity among behavioral features. In this study, a robust and efficient multi-modal framework was developed to recognize skeleton behavior using an improved Two-Stream Adaptive Graph Convolutional Network (2s-AGCN). A graph deep learning model was designed to capture both spatial and temporal dependencies in complex ewe behaviors. (1) Weighted Adaptive Graph Convolution Layer (W-AGCL) was introduced to overcome the limited adaptability of fixed skeletal topology when representing subtle or dynamic motion. The connection strengths between skeletal nodes were dynamically adjusted to adaptively learn the most informative spatial relationships, according to the behavioral context. The adaptive weighting mechanism enhanced the model sensitivity to spatial structure variations and the robustness to noise and individual differences among ewes. (2) Spatial Temporal Enhanced Attention Module (STE) was developed for the even temporal distribution and the presence of micro-movements in perinatal behaviors. The spatiotemporal regions were selectively emphasized to assign the higher attention weights into frames and joints with discriminative information, thereby improving the network to capture subtle but behaviorally significant cues. Furthermore, a four-stream graph convolutional architecture was proposed for the multi-modal feature fusion. Simultaneously, four complementary modalities were processed, including the Joint Stream representing skeletal joint coordinates, the Bone Stream capturing limb connectivity and orientation, the Joint Motion Stream describing temporal displacement of joints, and the Bone Motion Stream modeling dynamic variations in bone vectors. Among them, feature learning and deep interaction were integrated to clarify the static posture configurations and dynamic motion. A video dataset was constructed for the ewe perinatal behavior, including annotated samples of typical behaviors, such as standing, lying, turning, nest-building, and lambing. Experiments were then conducted to verify the model. The results show that the improved 2s-AGCN model achieved a Top-1 classification accuracy of 86.21%, thus outperforming several skeleton action recognition models, including ST-GCN, ST-GCN++, PoseC3D, Shift-GCN, CTR-GCN, and the original 2s-AGCN. Specifically, the Top-1 accuracies were 7.81%, 8.41%, 7.95%, 7.26%, 7.11%, and 6.89%, respectively. The better performance was achieved in the compact architecture with 5.70 million parameters, and the average inference latency per frame was 15.5 ms, fully supporting real-time monitoring in farm environments. The skeleton graph convolutional models were used to recognize the fine-grained animal behaviors in the perinatal period. The improved 2s-AGCN framework effectively balanced accuracy, efficiency, and real-time inference. Spatiotemporal dependencies were adaptively learned to integrate multi-modal skeletal information. The findings can provide a powerful tool for automatic behavior in smart sheep farming. Deep graph learning can be expected to monitor livestock in future applications, such as early lambing prediction, automatic reproduction, and precision welfare assessment in smart pastures.

HTML全文

参考文献(30)

施引文献

资源附件(0)