QI Haixia, HUANG Huiliang, LUO Xiwen, et al. Peanut yield prediction based on polynomial regression and stacked model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2025, 41(8): 165-174. DOI: 10.11975/j.issn.1002-6819.202407244
Citation: QI Haixia, HUANG Huiliang, LUO Xiwen, et al. Peanut yield prediction based on polynomial regression and stacked model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2025, 41(8): 165-174. DOI: 10.11975/j.issn.1002-6819.202407244

Peanut yield prediction based on polynomial regression and stacked model

  • Agricultural yield forecast can often rely mainly on the single model approach at present. The single model is trained on all available features for prediction. However, the approach cannot capture the complex nonlinear relationships between meteorological factors and yield. Traditional decomposition can also be limited to the long-term trend fitting (e.g., moving average, high-pass filtering). In this study, an integrated model was developed to forecast the peanut yield in the southwestern Guangdong Province, China. Meteorological data (temperature, precipitation, sunshine duration, wind speed, and relative humidity) and yield records were collected from the 16 test regions from 2000 to 2023. Polynomial regression and stacking ensemble learning were integrated using three systematic procedures: 1) Long-term trend modeling with polynomial regression was used to quantify the impact of technological advancements and agricultural management on the yield; 2) Dimensionality reduction via principal component analysis (PCA) was employed to extract the 12 principal components with a cumulative variance contribution of 90% from normalized meteorological data; 3) The base learners were set as the stacked generalization framework with the K-nearest neighbors (KNN), random forest (RF), and gradient boosting regressors (GBR). While the Lasso regression was set as the meta-learner. The cross-validation was optimized for the meteorological yield analysis. The superior performance of the improved model was achieved with a mean absolute percentage error (MAPE) of 2.09%, root mean square error (RMSE) of 78.55 kg/hm2, and coefficient of determination (R2) of 0.96. The relative error was reduced by 0.22 to 0.68 percentage points, compared with the individual machine learning. Specifically, the polynomial regression with the stacked model was achieved in the lowest MAPE (2.09%), MAE (57.10 kg/hm2), and RMSE (78.55 kg/hm2), compared with the rest models, such as the KNN (MAPE: 2.70%), RF (MAPE: 2.31%), and GBR (MAPE: 2.77%). The high R2 value of 0.96 indicated that the improved model was used to explain 96% of the variance in the actual yield data, demonstrating its high accuracy of prediction. According to the pre-August meteorological inputs (two months pre-harvest), early forecast testing also maintained a high accuracy (R2=0.94), with a MAPE of 2.91% and MAE of 71.88 kg/hm2. The high effectiveness of the improved was provided for the early yield forecast. A series of trials were carried out to validate the improved model for 2020-2023. The robustness of the model was further confirmed, with an average MAPE of 4.62% in the different regions. However, regional variations were also observed in the accuracy of the forecast. The MAPE ranged from 0.25% in Yunfu to 8.15% in Zhanjiang. There was also the strong influence of regional heterogeneity and non-meteorological factors, such as soil properties and farming practices. Different decomposition of the trend was also compared, including the moving average, exponential smoothing, high-pass filtering, and polynomial regression. Polynomial regression outperformed the rest. Among them, the long-term yield trends driven by technological advancements were accurately captured to smooth out the effects of extreme weather events. A more stable and accurate separation of trend and meteorological yield was obtained for the precise forecast. In conclusion, the polynomial regression was integrated for the trend analysis. A stacked ensemble model was suitable for the meteorological yield forecast. A robust and accurate framework was offered to forecast the peanut yield. Early forecasts were also provided for the regional variations during agricultural management and market strategy. Future research can further enhance to incorporate the additional variables, such as the soil properties and satellite data. The region-specific models can also be expected to consider the local agricultural practices and environmental conditions.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return