Abstract:
Accurate identification of rice pests is often crucial for national food security. A robust pest recognition model is also required in complex field scenarios. However, existing models are limited in cross-channel feature extraction due to the extremely imbalanced data in sample distribution. In this study, a dual-feature dynamic optimization model—ResNet-EAF, was proposed to recognize the rice pest with the imbalanced image dataset. Two feature calibration mechanisms were integrated, namely Efficient Channel Attention (ECA) and Channel-wise Affine Adaptation (CAA), to achieve synergistic and precise optimization of feature representation. This framework was used to construct the dual-feature optimization mechanism of "ECA feature screening—CAA feature calibration" after the Global Average Pooling (GAP) layer of the ResNet50 network. In the ECA module, the inter-channel correlations were established within local windows via adaptively matched 1D convolution kernels. Cross-channel interaction information was captured without dimensionality reduction. The weights of key pest feature channels were adaptively amplified to suppress the interference from redundant channels, thereby extracting the key features. Two types of learnable parameters were introduced into the CAA module. Each feature channel was realized to independently learn the configurations. The contribution weights were regulated and then refined using machine learning. The channels were prioritized for classification to weaken the noise channels during optimization. As such, the CAA module was realized to decouple the weight-bias coupling relationship in conventional affine transformation into channel-wise independent scaling-translation. The feature distinguishability among different pest categories was improved by the feature distribution in the datasets. The domain shift was then alleviated to enhance the generalization of the unknown field data. Notably, the very few learnable parameters were added with negligible computational overhead for the high recognition efficiency. A dynamic balanced loss strategy with Focal Loss (FL), inverse frequency weighting, and modulation coefficients acted synergistically with the dual-feature module to tackle the extreme sample imbalance. Specifically, the inverse frequency weighting dynamically assigned the weights according to the proportion of class samples to initially balance category distribution. The FL reduced the weights of easily classified majority-class samples via a modulation factor, thus focusing machine learning on hard-to-classify minority-class samples. Additional modulation coefficients were used to fine-tune the loss gradient for the training bias caused by extreme imbalance. Accurate identification of dominant pest categories was realized in the subtle core features of minority-class samples. The coverage of full-category recognition was significantly improved after optimization, which was highly compatible with the stable deep feature extraction of ResNet50. A series of experiments was conducted on the self-built pest image dataset. The ResNet-EAF model achieved an accuracy of 98.06%, a macro-average recall of 93.93%, and an F1-score of 93.80%, which were 3.63, 6.96, and 5.67 percentage points higher than the baseline model, respectively, thus ranking first among 11 models. Furthermore, the recall rates increased by 13.16% and 4.55%, respectively, in the minority-class pests (Pyralidae and Arctiidae). An anti-interference and generalization were evaluated on the public JUTE PEST dataset in real field environments. The ResNet-EAF model achieved an accuracy of 98.35% (second only to DINOv2’s 98.42%), with the highest macro-average recall and the recall rate of the minority-class Beet Armyworm among 11 models. Ablation experiments also verified the effectiveness of the dual-feature module. A channel importance pre-screening mechanism was lacking in the CAA module to locate key pest features. Generalized feature calibration failed to directly solve the minority-class recognition. In contrast, the ECA and CAA modules were combined into the complete "feature screening—calibration" closed loop. Firstly, the ECA module was used to precisely screen the key channels for the high-quality "effective feature base" in the CAA module. And then the CAA module performed the channel-wise adaptive calibration on these features for the feature distinguishability. In addition, the combination of ECA and dynamic balanced loss was achieved in the highest accuracy, macro-average recall, and F1-score among four mainstream attention mechanisms and five common loss functions. In summary, ResNet-EAF can provide an efficient technical solution to monitor pest disease under the imbalanced data scenarios using ECA-CAA dual-feature dynamic optimization and synergistic dynamic loss strategy. The practicality and robustness of the model were verified in complex field environments. The significant performance was obtained from the constructive interaction of FL, ECA, and CAA. The finding can offer an extensible solution for reliable field pest recognition in precision agriculture.