Learning Result Prediction Based on Improved SMOTE Algorithm and Ensemble Model
-
Graphical Abstract
-
Abstract
In order to solve the problem of poor applicability of a single machine learning algorithm in data classification and prediction tasks in different fields, and to alleviate the impact of severe imbalance in datasets on prediction performance, a learning result prediction method based on Synthetic Minority Oversampling(SMOTE) and the ensemble model was proposed. The traditional SMOTE algorithm generated new synthetic samples by interpolating minority class samples, which could result in the presence of noise and high similarity between synthetic samples. To address these issues, an improved SMOTE algorithm was proposed, which removed noisy and easily confused samples by distance calculation, resulting in high discriminative and pure synthetic samples. Subsequently, an ensemble method was utilized to adjust the weights of samples and classifiers, leading to the creation of a stronger classifier with improved classification performance. Experimental results on the public online learning dataset Kalboard 360 show that when using the Extreme Randomized Trees(ERT) classifier, in combination with improved SMOTE and Ensemble model, resulted in a prediction accuracy of 97. 9%, which is a 5. 5% increase compared to using a single ERT classifier. This demonstrates that the proposed SMOTE algorithm can generate highquality balanced data, and the performance of the Ensemble learning model is significantly better than that of a single machine learning algorithm.
-
-