Abstract:
Environmental parameters can be expected to be accurately acquired for the precise regulation of crop growth in solar greenhouses. However, the sensor data is susceptible to external environmental interference and hardware failures, frequently leading to the generation of anomalous values. Even the marginal presence of these anomalies can also trigger serious regulation misalignment, subsequently resulting in severe adverse effects on crop productivity and overall yield stability. In this research, a robust framework was proposed to detect and correct the anomalous environmental sensor data in a greenhouse. Specifically, the time-series prediction enhancement was structured into three stages. Firstly, a Long Short-Term Memory (LSTM) network was employed to extract deep temporal features from the sequential environmental data in the solar greenhouse. The superior performance was achieved in capturing the long-term dependencies and non-linear trends. Secondly, these temporal features were integrated into an eXtreme Gradient Boosting (XGBoost) model. The representation of temporal characteristics was also enhanced by the exceptional regression of the gradient boosting framework. Thereby, the high-precision prediction of sensor values was achieved even under fluctuation. Finally, a dynamic thresholding was utilized to analyze the residuals, the quantitative difference between the prediction and the actual observed values, according to a sliding window mechanism. The real-time identification of outliers was provided for the subsequent correction of data anomalies. The experimental trials were conducted to evaluate the framework. Results demonstrate that the LSTM-XGBoost fusion model achieved a remarkable True Positive Rate (TPR) ranging from 99.2% to 99.8% over the various parameters, including the air temperature, relative humidity, and Photosynthetically Active Radiation (PAR). Furthermore, the high accuracy of the system fell between 98.9% and 99.8%. Comparative analysis indicated that the fusion model significantly outperformed the individual baseline. The TPR increased by 15.4% and 17.3%, respectively, compared with the standalone LSTM and XGBoost models, while the overall accuracy was improved by 18.3% and 16.2%, respectively. The substantial performance was achieved by combining the recurrent neural networks with ensemble learning for the high-dimensional environmental data. In conclusion, the LSTM-XGBoost fusion model exhibited exceptional performance and superior robustness against noise in the anomaly detection tasks. The high precision was obtained to effectively bridge the gap between raw data acquisition and precise environmental control. A technical decision-making approach was provided for the real-time monitoring of environmental parameters in solar greenhouses. The stable and optimal environment parameters can ultimately contribute to the growth of the crops in the digital greenhouse of smart agriculture.