基于高光谱成像与基线补偿机制的豌豆含水率轻量化检测

欧阳尚韬; 侯佑飞; 黎艳兵; 徐添琦; 徐晨光; 陈孟禹; 刘燕德; 李斌

doi:10.11975/j.issn.1002-6819.202601123

基于高光谱成像与基线补偿机制的豌豆含水率轻量化检测

Lightweight moisture content detection of peas based on hyperspectral imaging and baseline compensation mechanism

摘要

摘要: 豌豆在干燥加工过程中的含水率实时监测对于保障成品品质及延长货架期至关重要。针对传统检测方法破坏性大、耗时且难以在线应用等问题，该研究利用可见-近红外（Visible - Near Infrared, Vis-NIR）高光谱成像技术（372.66～1039.65 nm），结合化学计量学方法，提出了一种针对豌豆含水率的高精度、轻量化无损检测策略。本研究采用热风干燥法对豌豆进行脱水处理，设置了9个干燥时间梯度，制备了360个含水率梯度分布均匀的样本，获取其高光谱图像并提取平均反射光谱。为消除高维光谱数据的多重共线性并提升模型运算效率，研究对比了最小绝对收缩和选择算子（LASSO）、自举软收缩算法（BOSS）、粒子群优化算法（PSO）及竞争性自适应重加权采样算法（CARS）四种特征波长筛选算法，并结合偏最小二乘回归（PLSR）、最小二乘支持向量机（LSSVM）、类别特征梯度提升树（CatBoost）及一维卷积神经网络（1D-CNN）构建预测模型。结果显示，CARS算法表现最优，其筛选出的10个特征波长（仅占全波段的5.7%）不仅覆盖了水分特征吸收区，还包含了低相关性的背景参考区域，特征波段消融试验证实引入这些低相关波段有效补偿了由豌豆干燥皱缩引起的物理基线漂移。基于此建立的CARS-LSSVM非线性模型取得了最优性能（预测集决定系数 \mathit\mathit \mathit\mathit\mathit\mathit\mathrmR_P^2\mathit\mathit\mathit\mathit =0.9648，预测均方根误差RMSEP=0.0477），其精度略优于保留了77个特征的PSO-LSSVM模型（ \mathrmR_P^2 =0.9637），且0.042 s的运行时间显著低于平均运行时间超过20 s的深度学习模型1D-CNN。该研究表明，通过CARS算法挖掘具有物理-化学耦合意义的极简特征子集，结合LSSVM模型，可实现豌豆含水率的低成本、高效率在线检测。

Abstract: Real-time monitoring of moisture content during pea drying is required for the high quality of finished products and extending shelf life. However, conventional detection, such as oven drying, cannot fully meet the online industrial monitoring in modern agriculture, due to the destructive, time-consuming, and severe lag. Compared with conventional single-point near-infrared spectroscopy, hyperspectral imaging can be expected to reduce background noise under irregular shrinkage during pea drying. In this study, high-precision, non-destructive, and lightweight detection was proposed for pea moisture content using visible and near-infrared (Vis-NIR) hyperspectral imaging. Furthermore, 360 pea samples (variety "Zhongwan No. 6") were prepared and subjected to hot air drying treatment at 60 °C. Sample data was collected at nine time intervals to cover the full range from a fresh high-moisture to a dried low-moisture state. A dataset was then constructed with a representative moisture gradient. Hyperspectral images of all samples were acquired in the spectral range of 372.66 to 1 039.65 nm. A threshold segmentation (reflectance threshold of 0.15 to 1.0 at 857.53 nm) was applied to extract the region of interest (ROI), thereby avoiding human subjective errors. Raw spectral data were then preprocessed after image acquisition using standard normal variate (SNV) to eliminate optical path differences. The dataset was partitioned into a calibration set and a prediction set at a ratio of 3:1 using the Kennard-Stone algorithm. Furthermore, four algorithms of feature wavelength screening were compared, including least absolute shrinkage and selection operator (LASSO), bootstrapping soft shrinkage (BOSS), particle swarm optimization (PSO), and competitive adaptive reweighted sampling (CARS). Prediction models were constructed to select these features using partial least squares regression (PLSR), least squares support vector machine (LSSVM), categorical boosting (CatBoost), and lightweight one-dimensional convolutional neural networks (1D-CNN). Spectral response analysis revealed that the drying process induced a significant downward baseline drift over the entire band. According to the Kubelka-Munk theory, the physical shrinkage and surface roughness of peas were attributed to dramatically enhancing the diffuse scattering. Experimental results indicated that the CARS algorithm exhibited the optimal performance of feature dimensionality reduction. A subset of ten characteristic wavelengths accounted for only 5.7% of the full spectrum. An ablation study on feature wavelengths revealed that the CARS retained the high-correlation water absorption bands (e.g., around 970 nm) and low-correlation baseline reference points (e.g., 504.14 and 1 035.63 nm). Once the low-correlation baseline reference points were artificially removed in the ablation study, the prediction coefficient of determination of the model dropped drastically to 0.715 5. These variables were effectively compensated for the physical baseline drift during drying shrinkage, according to an implicit mathematical difference. Among them, the CARS-LSSVM non-linear model achieved the best overall performance, thereby yielding an of 0.964 8 and a root mean square error of prediction (RMSEP) of 0.047 7. The high accuracy was superior to the PSO-LSSVM model, which retained 77 features. Furthermore, the CARS-LSSVM model demonstrated a remarkable performance in computational efficiency; Its running time was recorded at 0.042 s on the same hardware platform, which was hundreds of times higher than the deep learning model (1D-CNN). In conclusion, the strategy was used to effectively extract pure chemical information via decoupling the physical-chemical coupling effects during drying. The CARS algorithm was employed to mine a minimalist feature subset with the LSSVM model. A robust solution was achieved to detect pea moisture content. This finding can also provide solid data support and theoretical guidance for the low-cost, low-power, and handheld multi-spectral intelligent sensors.

HTML全文

参考文献(32)

施引文献

资源附件(0)