高级检索+

基于LSTM-XGBoost的日光温室环境传感数据质量控制方法

Data quality control method for solar greenhouse environmental parameters by sensors using a hybrid LSTM-XGBoost model

  • 摘要: 日光温室环境参数的精准采集对作物生长调控至关重要,但传感器数据易受环境干扰与设备故障影响产生异常值,即使仅少量异常,也会导致调控失准,进而严重影响作物产能。为此,该研究提出一种基于级联融合模型的温室环境数据自监督异常检测与修正方法。该方法首先利用长短期记忆网络(long short-term memory, LSTM)捕获环境数据的长程时序依赖,随后将其输出作为极端梯度提升(eXtreme gradient boosting, XGBoost)的输入,通过非线性残差补偿进一步增强特征表达,提升模型对复杂生境下动态变化规律的拟合精度。最后,基于预测偏差构建滑动窗口动态阈值判别逻辑,实现异常数据的精准识别和校准。试验结果表明,LSTM-XGBoost融合模型在空气温度、相对湿度与光合有效辐射检测中检测率达99.2%~99.8%,准确率达98.9%~99.8%。相对于单一LSTM与XGBoost模型,该研究模型检测率分别提升了15.4、17.3个百分点,准确率分别提升了18.3、16.2个百分点。综上所述,该研究提出的LSTM-XGBoost融合模型在异常检测能力上展现出良好的性能表现,可为日光温室环境参数的实时监测及精准调控提供有效的技术支撑。

     

    Abstract: Environmental parameters can be expected to be accurately acquired for the precise regulation of crop growth in solar greenhouses. However, the sensor data is susceptible to external environmental interference and hardware failures, frequently leading to the generation of anomalous values. Even the marginal presence of these anomalies can also trigger serious regulation misalignment, subsequently resulting in severe adverse effects on crop productivity and overall yield stability. In this research, a robust framework was proposed to detect and correct the anomalous environmental sensor data in a greenhouse. Specifically, the time-series prediction enhancement was structured into three stages. Firstly, a Long Short-Term Memory (LSTM) network was employed to extract deep temporal features from the sequential environmental data in the solar greenhouse. The superior performance was achieved in capturing the long-term dependencies and non-linear trends. Secondly, these temporal features were integrated into an eXtreme Gradient Boosting (XGBoost) model. The representation of temporal characteristics was also enhanced by the exceptional regression of the gradient boosting framework. Thereby, the high-precision prediction of sensor values was achieved even under fluctuation. Finally, a dynamic thresholding was utilized to analyze the residuals, the quantitative difference between the prediction and the actual observed values, according to a sliding window mechanism. The real-time identification of outliers was provided for the subsequent correction of data anomalies. The experimental trials were conducted to evaluate the framework. Results demonstrate that the LSTM-XGBoost fusion model achieved a remarkable True Positive Rate (TPR) ranging from 99.2% to 99.8% over the various parameters, including the air temperature, relative humidity, and Photosynthetically Active Radiation (PAR). Furthermore, the high accuracy of the system fell between 98.9% and 99.8%. Comparative analysis indicated that the fusion model significantly outperformed the individual baseline. The TPR increased by 15.4% and 17.3%, respectively, compared with the standalone LSTM and XGBoost models, while the overall accuracy was improved by 18.3% and 16.2%, respectively. The substantial performance was achieved by combining the recurrent neural networks with ensemble learning for the high-dimensional environmental data. In conclusion, the LSTM-XGBoost fusion model exhibited exceptional performance and superior robustness against noise in the anomaly detection tasks. The high precision was obtained to effectively bridge the gap between raw data acquisition and precise environmental control. A technical decision-making approach was provided for the real-time monitoring of environmental parameters in solar greenhouses. The stable and optimal environment parameters can ultimately contribute to the growth of the crops in the digital greenhouse of smart agriculture.

     

/

返回文章
返回