基于改进YOLOv9s与自适应卡尔曼滤波的套袋葡萄视频计数方法

吕佳; 冉洁

doi:10.11975/j.issn.1002-6819.202412026

基于改进YOLOv9s与自适应卡尔曼滤波的套袋葡萄视频计数方法

吕佳,
冉洁

Counting bagging grape using improved YOLOv9s and adaptive Kalman filter

LYU Jia,
RAN Jie

摘要

摘要: 针对现有果实计数方法实时性不足，以及套袋葡萄遮挡和检测噪声导致追踪失败的问题，该研究提出一种基于改进YOLOv9s与自适应卡尔曼滤波的套袋葡萄视频计数方法。该方法由改进YOLOv9s检测模型、自适应卡尔曼滤波追踪算法和划线计数3个子方法构成。在检测阶段，为减少YOLOv9s模型的参数量并提升推理速度，同时增强其在遮挡场景下的检测性能，设计了EFEM（efficient feature enhancement module）优化特征提取，并引入SEAM（spatially enhanced attention module）以提高遮挡情况下的检测性能。在追踪阶段，为解决因拍摄设备抖动和快速运动等因素引起的检测噪声导致卡尔曼滤波轨迹预测精度下降问题，提出一种自适应卡尔曼滤波追踪算法。该算法根据检测置信度自动调整噪声估计，以提高卡尔曼滤波对套袋葡萄轨迹的预测精度，进而提升追踪性能。在计数阶段，采用划线计数策略实现对套袋葡萄的自动计数。试验结果表明，在检测性能方面，改进后的YOLOv9s模型参数量减少了29.6%，推理速度达到了70帧/s；在追踪性能方面，改进后的追踪算法在高阶追踪准确率、多目标追踪准确率及ID调和平均数指标上，分别提升了4.3、2.2和2.5个百分点；在计数性能方面，平均计数精度达到了80.0%。综上，该方法在实时追踪与计数方面展现了良好的应用潜力，可为套袋葡萄收获前的产量估计提供技术支持。

Abstract: Grapes can be one type of the fruit with the widest cultivation area, the highest yield, and extremely high economic value in China. Among them, bagging techniques can be often employed to reduce the impact of pests and diseases on the grape quality during the harvest period. An accurate yield estimation can greatly contribute to the plan picking, sales, and storage, in order to reduce the economic losses caused by supply-demand mismatches. The accurate counting of bagged grapes can be required before yield estimation. However, the existing fruit counting can usually suffer from insufficient real-time detection and tracking failure, due to the occlusion of bagged grapes and unprocessed detection noise. In this study, video counting was proposed for the bagged grapes using an improved YOLOv9s and adaptive Kalman filter. Three modules were included: the improved YOLOv9s detection model, an adaptive Kalman filter tracking algorithm, and a line-drawing counting. In detection, the original RepNCSPELAN4 module in YOLOv9s was replaced with an efficient feature enhancement module (EFEM), in order to reduce the number of model parameters for the inference speed. The performance of the improved YOLOv9s model was enhanced for sufficient real-time detection. The EFEM was designed to selectively learn from the partial feature maps of the bagged grapes, thereby enabling efficient feature extraction and faster inference. The FasterNet module was specifically utilized to efficiently extract the spatial features, in order to minimize the redundant computation and memory access. A spatially enhanced attention module (SEAM) was introduced to further improve the detection performance under occlusion conditions. The SEAM was used to learn the relationship between occluded and unoccluded areas. The occluded features were predicted and compensated to thereby improve the detection accuracy of bagged grapes under full and partial occlusion. In tracking, an adaptive Kalman filter algorithm was proposed to reduce the detection noise caused by camera shake and rapid movement. The accuracy of Kalman filter trajectory prediction was promoted after tracking. Noise estimation was automatically adjusted, according to the detection confidence. A line-drawing counting was used for the real-time counting of bagged grapes; Once the center of the bagged grape was collided with a virtual counting line, the number of bagged grapes increased by one. The experimental dataset was collected from the PaiDengTe Technology Demonstration Park in Bishan District, Chongqing, China. There were 700 original images of bagged grapes and six video clips. The dataset was randomly divided into a training set of 490 images, a validation set of 140 images, and a test set of 70 images at a ratio of 7∶2∶1. The six video clips were used to test the counting performance. Some image enhancement techniques were applied to the training set during training, such as saturation adjustment, brightness variation, image mirroring, and Gaussian noise addition, thereby expanding the training set to 2100 images. The robustness and generalization of the detection model were enhanced after enhancement. Experimental results show that the best performance of the improved YOLOv9s model (ES-YOLOv9s) outperformed five other models. The highest mean average precision and recall were 96.9% and 93.1%, respectively, while there was an inference speed of 70 frames per second. Compared with the original YOLOv9s, the ES-YOLOv9s reduced the number of parameters by 29.6%, and the number of floating-point operations decreased by 10.9G, whereas, the frame rate was improved by 20 frames per second. In terms of tracking performance, the adaptive Kalman filter tracking algorithm achieved 58.6%, 63.6%, and 78.8% in the higher-order tracking accuracy, multi-object tracking accuracy, and ID harmonic mean metrics, respectively, thus representing improvements of 4.3, 2.2, and 2.5 percentage points over ByteTrack. In terms of counting performance, line-drawing counting was achieved with an average accuracy of 80.0%, compared with manual counting. In conclusion, the video counting of bagged grapes with the improved YOLOv9s and Kalman filter also demonstrated better application potential in real-time tracking and counting. The finding can provide technical support for the pre-harvest yield estimation of bagged grapes.

HTML全文

参考文献(33)

施引文献

资源附件(0)