Abstract:
Aiming at the feature selection problem of the flue-cured tobacco oil feature prediction model, an improved RF(random forest) algorithm feature selection strategy was proposed. First, the RF-Score of each feature was calculated by the RF feature selection algorithm, and the features were added to the feature subset in order according to the size of the RF-Score. If the classification accuracy of the classifier was improved, the feature was retained. If the classification accuracy of the classifier was not Increase or decrease, the feature was removed. The results show that when the RF feature selection algorithm features of the hyperspectral was used to screen flue-cured tobacco, 176 high spectral characteristics in descending order the Gini coefficient was input in turn to the SVM classifier. The first 64 hyperspectral band features can make the support vector machine classifier perform the best. The dimension of the feature subset was 64, and the classification accuracy was 93.33%. Using the improved RF feature selection strategy of 176 high flue-cured tobacco characteristics of spectral band selection, entering only six band hyperspectral characteristics, 371.08 nm, 716.71 nm, 378.31 nm, 487.77 nm, 484.09 nm, and 535.85 nm, will optimize the performance of the support vector machine classifier. The classification accuracy was 95%, and the dimension of the feature subset was 6, suggesting that the improved RF feature selection strategy can reduce the dimensionality of the data and reduce the feature set while ensuring the performance of the classifier. Compared with the full hyperspectral band, the number of features of the improved RF feature selection algorithm was reduced was 170 and the classification accuracy was improved by 3.33%. Compared with the RF feature selection algorithm, the number of features was reduced by 58, and the classification accuracy was increased by 1.67%.