视觉和传感器融合的多模态猪只行为识别

陈虹余; 尹令; 杨明; 张素敏; 林俊勇

doi:10.11975/j.issn.1002-6819.202410206

摘要: 为提升复杂场景下猪只行为识别准确率，该研究针对不同场景数据，采用不同处理策略提出一种融合视觉和传感器技术的多模态猪行为识别方法，具体为光线不足时，使用传感器数据进行行为分类；光线充足但猪只被严重遮挡时，结合视频和传感器数据实现行为分类；光线充足且猪只未被严重遮挡时，用视频数据建立FE-TSM（feature extention - temporal shift module）模型完成行为分类。采用该方法对8头长白猪每头7 d的视频和传感数据分析，将猪只行为分为“侧卧”“伏坐”“半坐”“采食”和“运动”5类。试验结果对比显示，仅用传感器数据平均行为识别准确率为68.60%，仅用视觉数据的平均准确率为78.78%，而采用融合传感器和视频数据的多模态方法平均识别准确率为88.82%，由此可知多模态方法在复杂场景下对猪只行为识别分析具有更高的准确率。

Abstract: Pig behavior monitoring was demonstrated to be crucial in precision livestock farming, as behavioral changes not only reflected health status but also influenced production performance, reproductive efficiency, and meat quality. Breeders could adjust feeding strategies promptly to enhance production efficiency and establish an effective disease early-warning mechanism, thereby ensuring pig welfare and achieving optimal economic benefits by monitoring the behavior of pigs. However, manually observing and identifying the behavior of pig herds continuously was time-consuming, laborious, and difficult to implement. Consequently, intelligent recognition of pig behaviors emerged as a research focus, primarily employing two approaches: visual technology and sensors technology. Visual recognition methods offered cost-effectiveness and easy deployment, however their accuracy was compromised by illumination variations and occlusions, along with limited individual identification capabilities. Sensor-based techniques required animals to wear sensing nodes with higher costs and demonstrated lower accuracy in distinguishing subtle behavioral differences. To overcome the limitations of single-technology recognition and improve behavioral identification accuracy in complex scenarios, this study proposed a multimodal pig behavior recognition method that adopted different processing strategies for various scenarios by integrating visual and sensors technologies. Three distinct data scenarios were defined: insufficient light, adequate light with severe occlusion, and adequate light without occlusion (including slight occlusion). For insufficient light conditions, a sensor single-layer classification method was implemented. This method initially segmented ear tag acceleration signals into multiple time windows and extracted 49 motion features from each window. Subsequently, random forest classifiers evaluated feature importance to select key features, followed by behavioral recognition using five machine learning models: Baseline, Logit, Random Forest, SVM, and KNN. In scenarios with adequate light but severe occlusion, a video sensor dual-layer classification approach was employed. The methodology first utilized the YOLOv8 model for pig posture classification in video data, then conducted detailed behavioral classification using ear tag acceleration data based on posture recognition results. For unobstructed scenarios with sufficient light, a video FE-TSM (feature extension-temporal shift module) model was developed. This architecture integrated CBAM (convolutional block attention module) for feature enhancement with temporal shift networks, enabling the model to focus on critical feature regions while preserving essential temporal information that might otherwise be lost through excessive shifting. The experiment used visual technology, sensor technology, and multimodal methods to analyze the video and sensor data of eight Landrace pigs, each with 7-day data. The pig behaviors were classified into five categories: lying on the side, crouching, half-sitting, eating, and sporting. The results showed that the average accuracy of behavior recognition using only sensor data was 68.60%, while visual data alone achieved 78.78%. In contrast, the multimodal method integrating sensor and video data demonstrated a significantly higher average recognition accuracy of 88.82%. This indicated that the multimodal approach substantially improved the precision of pig behavior recognition and analysis in complex scenarios. By combining visual and sensors technologies, the method addressed the limitations of individual modalities, enhanced the differentiation accuracy of the five behaviors, and thereby increased the reliability of pig behavior recognition under challenging environmental conditions.

视觉和传感器融合的多模态猪只行为识别

Multimodal recognition of pig behavior using vision and sensors