采用双重注意力特征金字塔网络检测群养生猪

胡志伟; 杨华; 娄甜田

doi:10.11975/j.issn.1002-6819.2021.05.019

采用双重注意力特征金字塔网络检测群养生猪

Instance detection of group breeding pigs using a pyramid network with dual attention feature

摘要

摘要: 为解决复杂环境下，猪只粘连和猪栏遮挡等因素给生猪个体多目标实例检测带来的困难，该研究以群养猪圈场景下8栏（日龄20～105 d）共计45头生猪为研究对象，以视频为数据源，采用平视视角、镜头位置不固定的数据采集方式，共获得标注图像3 834张，并将其划分为训练集2 490张、验证集480张、测试集864张。引入一种融合通道注意力（Channel Attention Unit，CAU）与空间注意力（Position Attention Unit，PAU）的双重注意力单元（Dual Attention Unit，DAU），并将DAU用于特征金字塔网络（Feature Pyramid Network，FPN）结构中，研究基于两大骨干网络ResNet50、ResNet101与4个任务网络Mask R-CNN、Cascade Mask R-CNN、MS R-CNN及HTC（Hybrid Task Cascade）交叉结合模型对群养生猪实例检测性能。结果表明：与CBAM、BAM、SCSE等注意力模块相比，HTC-R101-DAU比HTC-R101-CBAM在IOU阈值为0.5、0.75、0.5～0.95（所有目标）、0.5～0.95（大目标）条件下的4种AP（Average Precision）指标分别提升1.7%、1.7%、2.1%与1.8%；单独加入CAU与PAU以分别探究通道与空间注意力单元对任务网络检测性能影响，试验结果表明，DAU与CAU及PAU相比性能更佳，且加入特定串联数量的PAU效果优于CAU；为获取更为丰富的上下文信息，分别串联1～4个PAU单元以构建不同空间注意力模块，试验结果表明，随着串联PAU单元数量的增加，相同任务网络其预测AP指标值先增加后减小。综合分析，HTC-R101-DAU模型可更为精确、有效地对不同场景生猪进行实例检测，可为后续生猪个体深度研究奠定基础。

Abstract: High breeding density has brought a great risk to the prevention and control of swine fever in intensive pig production. Accurate and rapid detection of individual live pigs can contribute to taking timely countermeasures for the minimum incidence of diseases. However, some factors including pig adhesion and sundries barrier have made difficulties in the detection of individual pigs with multiple targets. In this study, a dual -attention feature pyramid network was proposed to rapidly detect the group breeding pigs. A total of 45 live pigs aged 20 to 105 days in 8 pens were selected as the research object. A head-up angle of view was used to collect a total of 3 834 labeled images, where 2 490 images were set as the training set, 480 as the validation set, and 864 as the test set. Two types of attention units were introduced into the Feature Pyramid Network (FPN), which encoded the semantic interdependencies in the channel (named Channel Attention Unit (CAU)), and spatial (named Position Attention Unit (PAU)) dimensions, respectively. The reason was that an attention-based method increased the weight of regional information for better instance detection, while suppressed the secondary information for the better model. The CAU had selectively enhanced the interdependencies among the channels by integrating the associated features. Meanwhile, the PAU had selectively aggregated the features at each position through a weighted sum of features at all positions. A Dual Attention Unit (DAU) was proposed to flexibly integrate CAU features with PAU information. An asymmetric convolution block was introduced to improve the robustness of the model to flipping and rotation. Two backbone networks were selected as ResNet50 and ResNet101, whereas, four major task networks were the Mask R-CNN, Cascade Mask R-CNN, MS R-CNN, and HTC cross-combination model, in order to detect the performance of group breeding pigs. The results showed that the embedding DAU contributed to the most significant performance in different task networks with distinct backbone networks, compared with the Convolutional Block Attention Module (CBAM), Bottleneck Attention Module (BAM), and Spatial-Channel Squeeze & Excitation (SCSE). When the HTC-R101-DAU was under the Intersection over Union (IOU) thresholds of 0.5, 0.75, 0.5-0.95 (all targets), and 0.5-0.95 (large targets), four Average Precision (AP) indicators increased by 1.7%, 1.7%, 2.1%, and 1.8%, respectively. There was a certain impact of backbone networks on the pig detection in the same task network. The detection of R50 was better than that of R101 in the task network without any attention unit. The detection AP values of two backbone networks were relatively close after adding the attention unit. The CAU and PAU were separately added to explore the influence of channels and positions in attention units on the detection performance of task network. Experiments showed that the DAU was comparable to CAU and PAU for the better AP index, indicating that simultaneous fusion of two-dimensional attention complemented each other for a high accuracy of position detection. In addition, a specific number of PAU units generally achieved better AP index values, compared with CAU. A position-attention module was constructed with 1 to 4 PAU units that connected in series for high accuracy of pixel-level dense context. The predictive values appeared a trend of increasing initially and decreasing afterwards after different numbers of PAU were merged under the same experimental conditions. Therefore, the HTC-R101-DAU model can more accurately and effectively detect live pigs in different scenes. The finding can provide a sound reference for the follow-up production of intensive pigs.

HTML全文

参考文献(35)

施引文献

资源附件(0)