Abstract:
A spatial distribution map of crops is often required in a wide range of agricultural and environmental applications, including the land use patterns, cropping intensity, and grain yields in sustainable agriculture. Current crop mapping techniques can rely primarily on machine learning algorithms due to the training data and portability. Deep learning can also require large amounts of training data, with the data dependency. However, the reliable, accurate, sufficiently large, and labeled datasets have consistently been limited for the data-driven tasks in the large-scale or heterogeneous agricultural regions. The high-quality training data can be compounded by cloud contamination in the optical remote sensing, inconsistent field surveys, and phenological variability over the landscapes, leading to the persistent challenge of scalable crop mapping. Crops can often follow a relatively stable and biologically driven growth rhythm under environmental and agronomic practices. The growth cycle can be divided into characteristic phenological stages (e.g., emergence, vegetative growth, flowering, and maturity). The time-series remote sensing data can be expected to systematically capture and quantify the vegetation indices, such as the NDVI or EVI. These temporal dynamics can also reduce the dependence on the large training data. Expert phenological knowledge can be encoded into computational models. In this study, a cropping pattern mapping was introduced to incorporate the crop phenology knowledge into the classification framework using Bayesian Networks. Key phenological features were extracted at the critical growth stages, according to a small number of the representative training samples. Knowledge probabilistic encoding was also performed to define the conditional dependencies. A Bayesian Network was constructed to tailor to the phenology-driven crop type classification. Empirical experiments were conducted to validate the effectiveness of the model in a region with highly complex cropping patterns. The results demonstrate that: 1) The model parameters were established either without training data or with only a limited number of samples, particularly with the phenological knowledge as a guiding framework. Thus, the accuracy was maintained under conditions of sample scarcity, with an overall mapping precision of over 92%. The prior knowledge effectively served as a data surrogate in knowledge remote sensing; 2) The Bayesian Network classification framework exhibited a 'weak learning - strong inference' performance, whereby the overfitting was avoided for the limited samples rather than the domain-specific knowledge structures for the inference. The highly precise fitting was observed in the data-driven models during the inference phase. The dependency of the machine learning on data was reduced by 42%. As a result, the data dependency and overfitting were avoided to enhance the interpretability, transparency, and portability of the classification in the machine learning models. The strong robustness was also achieved to generalize under the different temporal and spatial contexts, especially with the sparse or costly training data. The domain knowledge was then integrated with the probabilistic graphical modeling. A promising pathway can represent crop mapping in data-constrained environments. The finding can offer a practical alternative to fully data-driven approaches in the classification tasks during remote sensing. A geospatial data layer can also support decision-making in agricultural planning and food security evaluation.