Abstract:
High breeding density has brought a great risk to the prevention and control of swine fever in intensive pig production. Accurate and rapid detection of individual live pigs can contribute to taking timely countermeasures for the minimum incidence of diseases. However, some factors including pig adhesion and sundries barrier have made difficulties in the detection of individual pigs with multiple targets. In this study, a dual -attention feature pyramid network was proposed to rapidly detect the group breeding pigs. A total of 45 live pigs aged 20 to 105 days in 8 pens were selected as the research object. A head-up angle of view was used to collect a total of 3 834 labeled images, where 2 490 images were set as the training set, 480 as the validation set, and 864 as the test set. Two types of attention units were introduced into the Feature Pyramid Network (FPN), which encoded the semantic interdependencies in the channel (named Channel Attention Unit (CAU)), and spatial (named Position Attention Unit (PAU)) dimensions, respectively. The reason was that an attention-based method increased the weight of regional information for better instance detection, while suppressed the secondary information for the better model. The CAU had selectively enhanced the interdependencies among the channels by integrating the associated features. Meanwhile, the PAU had selectively aggregated the features at each position through a weighted sum of features at all positions. A Dual Attention Unit (DAU) was proposed to flexibly integrate CAU features with PAU information. An asymmetric convolution block was introduced to improve the robustness of the model to flipping and rotation. Two backbone networks were selected as ResNet50 and ResNet101, whereas, four major task networks were the Mask R-CNN, Cascade Mask R-CNN, MS R-CNN, and HTC cross-combination model, in order to detect the performance of group breeding pigs. The results showed that the embedding DAU contributed to the most significant performance in different task networks with distinct backbone networks, compared with the Convolutional Block Attention Module (CBAM), Bottleneck Attention Module (BAM), and Spatial-Channel Squeeze & Excitation (SCSE). When the HTC-R101-DAU was under the Intersection over Union (IOU) thresholds of 0.5, 0.75, 0.5-0.95 (all targets), and 0.5-0.95 (large targets), four Average Precision (AP) indicators increased by 1.7%, 1.7%, 2.1%, and 1.8%, respectively. There was a certain impact of backbone networks on the pig detection in the same task network. The detection of R50 was better than that of R101 in the task network without any attention unit. The detection AP values of two backbone networks were relatively close after adding the attention unit. The CAU and PAU were separately added to explore the influence of channels and positions in attention units on the detection performance of task network. Experiments showed that the DAU was comparable to CAU and PAU for the better AP index, indicating that simultaneous fusion of two-dimensional attention complemented each other for a high accuracy of position detection. In addition, a specific number of PAU units generally achieved better AP index values, compared with CAU. A position-attention module was constructed with 1 to 4 PAU units that connected in series for high accuracy of pixel-level dense context. The predictive values appeared a trend of increasing initially and decreasing afterwards after different numbers of PAU were merged under the same experimental conditions. Therefore, the HTC-R101-DAU model can more accurately and effectively detect live pigs in different scenes. The finding can provide a sound reference for the follow-up production of intensive pigs.