高级检索+

多视角体素三维骨架重建的猪只跛行检测

Multi-view voxel-based 3D skeleton reconstruction for pig lameness detection

  • 摘要: 猪只跛行是影响养殖效益和动物福利的重要健康指标,现有跛行检测方法多基于单视角和受限步道,限制猪只自由,且易受遮挡影响。为实现自然、非接触式和高精度的猪只跛行检测,该研究提出一种无需三维标注的端到端多视角三维骨架重建网络(multi-view voxel network, MV-VoxelNet)和基于三维骨架重建的猪只跛行检测方法。MV-VoxelNet采用“全局定位—局部细化”结构,引入跨视角注意力模块(voxel cross-view attention, VCVA),同时设计轻量化(refinement voxel-to-voxel, RefineV2V)网络,从多视角同步图像中重建出高精度的猪只三维关键点坐标。进而,基于步态周期视频序列预测连续三维骨架坐标,并提出适用于猪只自由运动的抗角度变化的动态局部坐标系,在此基础上设计了蹄部对称性等六项步态特征,利用支持向量机(support vector machine, SVM)实现对猪只正常、轻度和重度跛行三分类的识别。结果表明,MV-VoxelNet在无三维标注的条件下仍可实现重投影误差(reprojection error, RE)为10.167像素,归一化阈值为0.05时的正确关键点比例(percentage of correct keypoints at 0.05 normalized threshold, PCK@0.05)为88.83%,较原始模型提升13.59个百分点;跛行检测准确率达88.51%,验证了所提方法对猪只跛行检测的可行性。该研究可为猪只三维姿态建模与跛行自动检测提供一种高精度、低标注成本且易于部署的技术方案。

     

    Abstract: Pig lameness is one of the most important health indicators for the production efficiency and animal welfare in modern pig farming. Existing lameness detection is mostly constrained walkway system using a single-view camera, which is highly sensitive to occlusion and self-occlusion. Walkway-based acquisition is further required for the special facilities. The stress-related and unnatural locomotion can be induced to limit the reliability and representativeness of the gait data. In this study, an end-to-end multi-view 3D pose estimation network without 3D annotations, termed the Multi-View Voxel Network (MV-VoxelNet), was proposed for the natural, non-contact, and high-precision pig lameness detection under free-moving conditions. A lameness detection was developed using reconstructed 3D skeletons. Accurate 3D pig keypoints were extracted from the synchronized multi-view RGB images using a coarse-to-fine process with the global localization followed by local refinement. MV-VoxelNet was then reconstructed after extraction. In the global localization stage, a coarse 3D representation of the pig was constructed to fuse the multi-view observations into a voxel grid. A voxel cross-view attention (VCVA) module was introduced to adaptively fuse the features from the different camera views, in order to effectively integrate the multi-view information for robustness under occlusion. The voxel features were processed by a 3D voxel-to-voxel network. An initial 3D skeleton was obtained after fusion. In the local refinement stage, a lightweight refinement voxel-to-voxel network (RefineV2V) was refined for each keypoint. A small voxel region was constructed to center on its initial prediction. More precise localization was realized with a low computational cost. The continuous 3D keypoint coordinate sequences were obtained from the consecutive video frames using the reconstructed 3D skeletons. A dynamic local coordinate system was constructed to normalize the gait representations in order to handle free movement and varying walking directions. Specifically, four hoof contact points were extracted to estimate an optimal ground support plane via singular value decomposition. The plane normal was defined as the vertical axis. The forward body direction was determined using the orientation of the trunk keypoints, together with the support plane. A stable local coordinate system was established for each frame. All 3D keypoints were then transformed into this coordinate system, effectively eliminating the influence of the global orientation for the consistent gait analysis under free-walking conditions. Furthermore, 6 gait features were designed to characterize the pig locomotion using this dynamic local coordinate system. These features captured the complementary aspects of the gait abnormality, including the hoof symmetry, trunk vertical stability, lateral trunk inclination, gait phase imbalance, swing initiation and termination consistency, and fore–hind hoof following behavior. The extracted features were subsequently fed into a support vector machine (SVM) classifier to identify the normal, mildly lame, and severely lame pigs. Results showed that the MV-VoxelNet achieved a reprojection error (RE) of 10.17 pixels and a percentage of correct keypoints at a normalized threshold of 0.05 (PCK@0.05) of 88.83% without any 3D ground-truth annotations, which improved the baseline by 13.59 percentage points, respectively. Moreover, the lameness detection was achieved in an overall classification accuracy of 88.51%, indicating the effectiveness of the framework. Overall, a unified pipeline was presented to reconstruct the 3D pig skeletons from the synchronized multi-view RGB images. The gait features were extracted in a dynamic local coordinate system. The lameness severity was classified using a machine learning classifier. The framework required no 3D annotations using only a few standard RGB cameras, indicating the low annotation cost and easy deployment. Accurate, contact-free monitoring of the pig gait can be expected in the real-world pig farms. These finding can also provide a scalable solution for the 3D pig pose modeling and lameness detection.

     

/

返回文章
返回