预处理方法对异源近红外秸秆营养成分速测模型的影响

王鑫磊; 韩鲁佳; 杨增玲; 肖卫华; 王婕妤; 于涞源; 戴嘉伟; 朱礼强

doi:10.11975/j.issn.1002-6819.202410140

预处理方法对异源近红外秸秆营养成分速测模型的影响

Influence of preprocessing methods on the rapid quantitative models for crop straw nutrients using different near-infrared spectroscopy devices data

摘要

摘要: 光谱预处理方法（preprocessing method, PM）与基于不同应用场景近红外仪器的秸秆营养成分速测模型存在适配性差异，需深入探索不同PM及其组合对异源数据模型影响差异及其机理。该研究收集了250个秸秆样本及其营养成分信息，采用实验室级和工程在线2种用途近红外仪器采集光谱数据，基于去趋势变换、多元散射校正、标准正态变量变换、Savitzky-Golay卷积平滑和Savitzky-Golay卷积一阶导数5种PM方法及其组合构建了156个预测模型，多维度可视化表征了不同PM及其组合对不同仪器数据及其模型的影响，解析了不同模型隐含的官能团信息差异。结果表明，秸秆近红外光谱自相关存在显著波段特异性，可分为10 000～7 100、7 100～5 300和5 300～4 000cm⁻¹ 3个区间；PM通过增强与目标化学成分相关特征波段权重优化模型，但仅有部分PM及其组合能显著提高模型精度；秸秆碳元素近红外预测模型精度最高，而碳氮比最低；仪器差异会导致PM响应机制分化，实验室级仪器采用单一PM如多元散射校正、Savitzky-Golay卷积平滑即可使秸秆碳元素与氮元素模型达到最优，而工程在线仪器则需联合去趋势变换与其他PM的组合提升模型质量。研究可为秸秆营养成分速测技术及工程在线仪器推广提供数据模型与理论支撑。

Abstract: Near-infrared spectroscopy (NIRS) has been widely used to rapidly assess crop residue nutrients. However, the different spectral preprocessing can pose varying impacts on the performance of the NIRS models, depending mainly on the application and device. Therefore, this study aims to explore the effects and mechanisms of preprocessing techniques on the rapid quantitative models derived from different near-infrared devices. 250 samples of the crop residues were collected, including rice, wheat, maize, rapeseed, and cotton. A systematic analysis was then made of their carbon (C), nitrogen (N), and carbon-to-nitrogen ratio (C/N) data. Two representative near-infrared devices were used to capture the spectra: one for laboratory use, and another for online engineering applications. Five preprocessing techniques were evaluated, including the detrending (DTD), multivariate scatter correction (MSC), standard normal variate transformation (SNV), Savitzky-Golay smoothing (SG), and the first derivative of Savitzky-Golay convolution (SGD1). An investigation was implemented to explore the impact of these preprocessing techniques and their combinations on the data from the laboratory and online engineering devices in a multidimensional manner. The spectral prediction models were constructed to evaluate the RMSE and R2 using Partial least square (PLS). The sample set partitioning based on joint x-y distance (SPXY) was utilized for the data split. Meanwhile, the chemical functional group was extensively implied by each model under different preprocessing conditions. There was a great variability of chemical functional groups in the variable importance across different models. The results show that: Spectral curves from the laboratory device exhibited higher variability in the 7 000-4 000 cm^-1 range, which was beneficial for the quantitative models. In contrast, the engineering online device showed prominent waveforms in the 10 000-7 000 cm^-1 range. While both devices shared similar spectral patterns between 7 000-4 000 cm^-1. The autocorrelation analysis was carried out on the spectral patterns. Three correlation intervals were divided into: 10 000-7 100 cm^-1, 7 100-5 300 cm^-1, and 5 300-4 000 cm^-1, corresponding to the near-infrared first overtone and combination regions. Furthermore, the MSC and SNV provided similar preprocessing in the laboratory device. While the DTD and SGD1 were more effective for the engineering online device, particularly in the 4400-4000 cm^-1 range. Overall, the better performance depended mainly on the appropriate selection of preprocessing techniques and their combinations. The weight of spectral bands also increased to predict the chemical properties. Thus the RMSE decreased by 0.01-0.10, while the R² increased by 0.01-0.18. The high accuracy of the prediction was ranked in the descending order of the C > N > C/N. While the importance of the spectral variables remained consistent across the different devices for the same material and property. There was a great variation in the weights of the variables. The laboratory device yielded relatively accurate models with minimal influence from preprocessing; The optimal performance was achieved in the single preprocessing, such as the MSC for C, SG for N, or SNV for C/N. In contrast, the engineering online device was required to optimize the model performance using preprocessing, such as the DTD or its combination with the rest. Furthermore, there was no need to prioritize the order of the preprocessing combination, when the SG was combined to process the spectral data. Therefore, an optimal preprocessing was tailored for the specific device and material properties, when developing NIRS models. This research finding can also provide theoretical support to optimize the engineering of online devices. The valuable data and insights can greatly contribute to the high-value and low-carbon utilization of crop residues on a large scale.

HTML全文

参考文献(33)

施引文献

资源附件(0)