Abstract:
Near-infrared spectroscopy (NIRS) has been widely used to rapidly assess crop residue nutrients. However, the different spectral preprocessing can pose varying impacts on the performance of the NIRS models, depending mainly on the application and device. Therefore, this study aims to explore the effects and mechanisms of preprocessing techniques on the rapid quantitative models derived from different near-infrared devices. 250 samples of the crop residues were collected, including rice, wheat, maize, rapeseed, and cotton. A systematic analysis was then made of their carbon (C), nitrogen (N), and carbon-to-nitrogen ratio (C/N) data. Two representative near-infrared devices were used to capture the spectra: one for laboratory use, and another for online engineering applications. Five preprocessing techniques were evaluated, including the detrending (DTD), multivariate scatter correction (MSC), standard normal variate transformation (SNV), Savitzky-Golay smoothing (SG), and the first derivative of Savitzky-Golay convolution (SGD1). An investigation was implemented to explore the impact of these preprocessing techniques and their combinations on the data from the laboratory and online engineering devices in a multidimensional manner. The spectral prediction models were constructed to evaluate the RMSE and R2 using Partial least square (PLS). The sample set partitioning based on joint x-y distance (SPXY) was utilized for the data split. Meanwhile, the chemical functional group was extensively implied by each model under different preprocessing conditions. There was a great variability of chemical functional groups in the variable importance across different models. The results show that: Spectral curves from the laboratory device exhibited higher variability in the 7 000-4 000 cm
-1 range, which was beneficial for the quantitative models. In contrast, the engineering online device showed prominent waveforms in the 10 000-7 000 cm
-1 range. While both devices shared similar spectral patterns between 7 000-4 000 cm
-1. The autocorrelation analysis was carried out on the spectral patterns. Three correlation intervals were divided into: 10 000-7 100 cm
-1, 7 100-5 300 cm
-1, and 5 300-4 000 cm
-1, corresponding to the near-infrared first overtone and combination regions. Furthermore, the MSC and SNV provided similar preprocessing in the laboratory device. While the DTD and SGD1 were more effective for the engineering online device, particularly in the 4400-4000 cm
-1 range. Overall, the better performance depended mainly on the appropriate selection of preprocessing techniques and their combinations. The weight of spectral bands also increased to predict the chemical properties. Thus the RMSE decreased by 0.01-0.10, while the
R2 increased by 0.01-0.18. The high accuracy of the prediction was ranked in the descending order of the C > N > C/N. While the importance of the spectral variables remained consistent across the different devices for the same material and property. There was a great variation in the weights of the variables. The laboratory device yielded relatively accurate models with minimal influence from preprocessing; The optimal performance was achieved in the single preprocessing, such as the MSC for C, SG for N, or SNV for C/N. In contrast, the engineering online device was required to optimize the model performance using preprocessing, such as the DTD or its combination with the rest. Furthermore, there was no need to prioritize the order of the preprocessing combination, when the SG was combined to process the spectral data. Therefore, an optimal preprocessing was tailored for the specific device and material properties, when developing NIRS models. This research finding can also provide theoretical support to optimize the engineering of online devices. The valuable data and insights can greatly contribute to the high-value and low-carbon utilization of crop residues on a large scale.