Abstract:
Accurate crop type classification from remote sensing imagery is often required in precision agriculture. Its reliability can depend mainly on the representative training samples, especially on the county scale, where sample acquisition is costly due to the limited sample size. Conventional sampling strategies primarily emphasize spatial uniformity or prior zoning, such as random sampling, systematic (grid) sampling, or empirical stratification. But they cannot capture crop spectral variability and its interaction with sample size. In this study, geographical similarity was applied to the sampling strategy in feature space. County-level crop classification was also taken as an experimental scenario. Stratified random and systematic sampling were adopted as the baseline. Three classification models—support vector machine (SVM), random forest (RF), and temporal convolutional network (TCN)—were employed to conduct comparative experiments under multiple sample-size conditions. Model performance was evaluated on the classification accuracy and sample representativeness. Experimental results show that the sampling strategies performed best under the condition of 42 training samples. The similarity sampling strategy was achieved in the effective coverage of the feature space with fewer samples. Higher accuracy was also achieved than the baseline. Specifically, classification accuracy was improved by approximately 2%-12% in the SVM and TCN models, while the differences among sampling strategies in the RF model remained relatively small. Accuracy differences among the sampling strategies gradually reduced as sample size increased, indicating that the similarity sampling is suitable for the early stage with limited samples. Furthermore, the sampling effectiveness depended mainly on the crop-specific spectral heterogeneity. Crops with stable spectral signatures were achieved in high classification accuracy with limited samples. Whereas the higher spectral heterogeneity or mixed spectral behavior benefited more from similarity sampling. Sample representativeness was also enhanced to expand the coverage of the feature space. The similarity sampling was particularly suitable for the complex crop classification tasks under limited sampling. A relatively flat plain with low environmental heterogeneity can constrain the full potential of similarity sampling under more complex environmental gradients. It is often required to apply to the mountainous or highly heterogeneous regions. In addition, the explicit indicator can also be used for the classification accuracy and the quantitative relationship between sample representativeness and environmental complexity. Future work can incorporate the explicit representative metrics and the coupling mechanisms among sample representativeness, landscape heterogeneity, and accuracy. The robustness of sampling strategies can be further improved for crop classification using remote sensing.