Abstract:
Smoke can stand out as the most critical and accessible visual cue among the various early warning signals of forest fires. Its rapid and accurate identification can also be required for fire monitoring with less damage. However, the sufficiently high-quality samples are extremely challenging in forest fire detection, due to the suddenness of the fire events, the low recurrence rate of fires in the same spatiotemporal regions, and the high variability of the forest in the different areas. There are scattered data features and insufficient sample diversity. In this study, a Region Level Feature Extractor (Relefe) deep neural network was proposed to specifically recognize the forest fire smoke under limited sample scenarios. The region-level features were adaptively extracted from the low-level visual information. The fine details of the forest fire smoke were learned to enhance the performance of the model, compared with the conventional approach under limited samples. Among them, the dual-branch collaborative architecture was adopted, where the vision branch was focused on the probabilistic modeling of the region-level perception domains. The target smoke regions were focused on the probability distribution of the pixel with the extreme values (closer to 0 or 1). Sigmoid activation was also utilized at the terminal layer, and ReLU activation in intermediate layers. Effective transmission of the intermediate feature was obtained to normalize the final probability outputs. By contrast, the feature branch was to refine the local features. The dynamic characteristics of the smoke were preserved using two convolutional layers with padding operations. Feature map dimensions were maintained for the stacking convolution in the channel dimension. As such, the critical low-level features were retained after optimization. Two branches were fused via global convolution. The region-level features were generated to multiply the elements of the two feature matrices, effectively linking the local details with the global semantics. The effectiveness of the Relefe was validated to integrate with two mainstream deep vision models: ResNet18 and Vision Transformer (ViT). In ResNet18, Relefe's output was transformed via the feature map dimensional conversion to match the input structure of the backbone network, thus ensuring lossless transmission of region-level features. In ViT, the Relefe replaced the standard fixed-size patch embedding with the adaptive irregular patch partitioning. The patch distribution was adjusted dynamically, according to the smoke morphology. The number of patches was reduced to simplify the Encoder structure, thus lowering the computational costs. The results demonstrated that the Relefe significantly improved the model performance. The Relefe-ViT was achieved in an accuracy of 85.09%, which was a 13-percentage-point increase, compared with the baseline ViT (71.93%); Relefe-ResNet18 reached 85.96%, which was improved by an 8-percentage-point improvement over ResNet18 (77.19%). Comparative experiments were performed on the Relefe's scale (width and depth). The better performance was then achieved with fewer channels. While the network depth slightly decreased the accuracy, but remained higher than baseline models. The finding can provide a lightweight and highly robust solution for the forest fire smoke detection under limited samples. The Relefe algorithm can enhance the accuracy and efficiency of the smoke recognition on large-scale datasets. The valuable insights can also be offered for the small-sample learning in computer vision.