多视角体素三维骨架重建的猪只跛行检测

胡梓宣; 项俐; 杨彬; 王海冬; 甘海明; 薛月菊

doi:10.11975/j.issn.1002-6819.202507180

多视角体素三维骨架重建的猪只跛行检测

Multi-view voxel-based 3D skeleton reconstruction for pig lameness detection

摘要

摘要: 猪只跛行是影响养殖效益和动物福利的重要健康指标，现有跛行检测方法多基于单视角和受限步道，限制猪只自由，且易受遮挡影响。为实现自然、非接触式和高精度的猪只跛行检测，该研究提出一种无需三维标注的端到端多视角三维骨架重建网络（multi-view voxel network, MV-VoxelNet）和基于三维骨架重建的猪只跛行检测方法。MV-VoxelNet采用“全局定位—局部细化”结构，引入跨视角注意力模块（voxel cross-view attention, VCVA），同时设计轻量化（refinement voxel-to-voxel, RefineV2V）网络，从多视角同步图像中重建出高精度的猪只三维关键点坐标。进而，基于步态周期视频序列预测连续三维骨架坐标，并提出适用于猪只自由运动的抗角度变化的动态局部坐标系，在此基础上设计了蹄部对称性等六项步态特征，利用支持向量机（support vector machine, SVM）实现对猪只正常、轻度和重度跛行三分类的识别。结果表明，MV-VoxelNet在无三维标注的条件下仍可实现重投影误差（reprojection error, RE）为10.17像素，归一化阈值为0.05时的正确关键点比例（percentage of correct keypoints at 0.05 normalized threshold, PCK@0.05）为88.83%，较原始模型分别提升了5.44像素与13.59个百分点；跛行检测准确率达88.51%，验证了所提方法对猪只跛行检测的可行性。该研究可为猪只三维姿态建模与跛行自动检测提供一种高精度、低标注成本且易于部署的技术方案。

Abstract: Pig lameness is an important health indicator that significantly affects production efficiency and animal welfare in modern pig farming. Existing lameness detection methods are mostly based on single-view camera or constrained walkway systems, which are highly sensitive to occlusion and self-occlusion. Walkway-based acquisition further requires dedicated facilities and may induce stress-related, unnatural locomotion, thereby limiting the reliability and representativeness of collected gait data. To enable natural, non-contact, and high-precision pig lameness detection under free-moving conditions, this study proposes an end-to-end multi-view 3D pose estimation network without 3D annotations, termed the Multi-View Voxel Network (MV-VoxelNet), and developed a lameness detection method based on reconstructed 3D skeletons. MV-VoxelNet reconstructed accurate 3D pig keypoints from synchronized multi-view RGB images using a coarse-to-fine process with global localization followed by local refinement. In the global localization stage, a coarse 3D representation of the pig was constructed by fusing multi-view observations into a voxel grid. To effectively integrate multi-view information and enhance robustness under occlusion, a voxel cross-view attention (VCVA) module was introduced to adaptively fuse features from different camera views. The fused voxel features are processed by a 3D voxel-to-voxel network to obtain an initial 3D skeleton. In the local refinement stage, a lightweight refinement voxel-to-voxel network (RefineV2V) refined each keypoint by constructing a small voxel region centered on its initial prediction, enabling more precise localization with low computational cost. Based on the reconstructed 3D skeletons, continuous 3D keypoint coordinate sequences are obtained from consecutive video frames. To handle free movement and varying walking directions, a dynamic local coordinate system is constructed to normalize gait representations. Specifically, four hoof contact points are extracted to estimate an optimal ground support plane via singular value decomposition, and the plane normal is defined as the vertical axis. The forward body direction is determined using the orientation of trunk keypoints, and together with the support plane, a stable local coordinate system is established for each frame. All 3D keypoints are then transformed into this coordinate system, effectively eliminating the influence of global orientation changes and enabling consistent gait analysis under free-walking conditions. Within this dynamic local coordinate system, 6 gait features are designed to characterize pig locomotion. These features capture complementary aspects of gait abnormality, including hoof symmetry, trunk vertical stability, lateral trunk inclination, gait phase imbalance, swing initiation and termination consistency, and fore–hind hoof following behavior. The extracted features are subsequently fed into a support vector machine (SVM) classifier to identify normal, mildly lame, and severely lame pigs. Results showed that, without any 3D ground-truth annotations, MV-VoxelNet achieved a reprojection error (RE) of 10.17 pixels and a percentage of correct keypoints at a normalized threshold of 0.05 (PCK@0.05) of 88.83%, improving the baseline by 5.44 pixels and 13.59 percentage points, respectively. Moreover, the proposed lameness detection method achieved an overall classification accuracy of 88.51%, demonstrating the effectiveness of the proposed framework. Overall, this study presents a unified pipeline that reconstructed 3D pig skeletons from synchronized multi-view RGB images, extracted gait features in a dynamic local coordinate system, and classified lameness severity using a machine learning classifier. The proposed framework requires no 3D annotations and uses only a few standard RGB cameras, enabling low annotation cost and easy deployment. These results indicate the potential of the proposed method for accurate, contact-free pig gait monitoring in real-world pig farms, providing a scalable solution for 3D pig pose modeling and automatic lameness detection.

HTML全文

参考文献(33)

施引文献

资源附件(0)