Abstract:
Pig lameness is an important health indicator that significantly affects production efficiency and animal welfare in modern pig farming. Existing lameness detection methods are mostly based on single-view camera or constrained walkway systems, which are highly sensitive to occlusion and self-occlusion. Walkway-based acquisition further requires dedicated facilities and may induce stress-related, unnatural locomotion, thereby limiting the reliability and representativeness of collected gait data. To enable natural, non-contact, and high-precision pig lameness detection under free-moving conditions, this study proposes an end-to-end multi-view 3D pose estimation network without 3D annotations, termed the Multi-View Voxel Network (MV-VoxelNet), and developed a lameness detection method based on reconstructed 3D skeletons. MV-VoxelNet reconstructed accurate 3D pig keypoints from synchronized multi-view RGB images using a coarse-to-fine process with global localization followed by local refinement. In the global localization stage, a coarse 3D representation of the pig was constructed by fusing multi-view observations into a voxel grid. To effectively integrate multi-view information and enhance robustness under occlusion, a voxel cross-view attention (VCVA) module was introduced to adaptively fuse features from different camera views. The fused voxel features are processed by a 3D voxel-to-voxel network to obtain an initial 3D skeleton. In the local refinement stage, a lightweight refinement voxel-to-voxel network (RefineV2V) refined each keypoint by constructing a small voxel region centered on its initial prediction, enabling more precise localization with low computational cost. Based on the reconstructed 3D skeletons, continuous 3D keypoint coordinate sequences are obtained from consecutive video frames. To handle free movement and varying walking directions, a dynamic local coordinate system is constructed to normalize gait representations. Specifically, four hoof contact points are extracted to estimate an optimal ground support plane via singular value decomposition, and the plane normal is defined as the vertical axis. The forward body direction is determined using the orientation of trunk keypoints, and together with the support plane, a stable local coordinate system is established for each frame. All 3D keypoints are then transformed into this coordinate system, effectively eliminating the influence of global orientation changes and enabling consistent gait analysis under free-walking conditions. Within this dynamic local coordinate system, 6 gait features are designed to characterize pig locomotion. These features capture complementary aspects of gait abnormality, including hoof symmetry, trunk vertical stability, lateral trunk inclination, gait phase imbalance, swing initiation and termination consistency, and fore–hind hoof following behavior. The extracted features are subsequently fed into a support vector machine (SVM) classifier to identify normal, mildly lame, and severely lame pigs. Results showed that, without any 3D ground-truth annotations, MV-VoxelNet achieved a reprojection error (RE) of 10.17 pixels and a percentage of correct keypoints at a normalized threshold of 0.05 (PCK@0.05) of 88.83%, improving the baseline by 5.44 pixels and 13.59 percentage points, respectively. Moreover, the proposed lameness detection method achieved an overall classification accuracy of 88.51%, demonstrating the effectiveness of the proposed framework. Overall, this study presents a unified pipeline that reconstructed 3D pig skeletons from synchronized multi-view RGB images, extracted gait features in a dynamic local coordinate system, and classified lameness severity using a machine learning classifier. The proposed framework requires no 3D annotations and uses only a few standard RGB cameras, enabling low annotation cost and easy deployment. These results indicate the potential of the proposed method for accurate, contact-free pig gait monitoring in real-world pig farms, providing a scalable solution for 3D pig pose modeling and automatic lameness detection.