高级检索+

太赫兹光谱结合特征选择算法分选掺混稻种

Sorting of Mixed Oryza sativa L. Seeds by Terahertz Spectrum and Feature Selection Algorithm

  • 摘要: 农业生产安全是食品安全的重要组成,粳米作为日常食用的大米,劣质掺混种子的快速检验是相关领域的重要研究工作。本研究使用太赫兹时域光谱采集220份掺混稻种和纯品稻种样本的光谱信号,通过傅里叶变换(Fourier transform, FT)对光谱数据进行预处理,将时域信号转化为频域信号作为建模数据集,对QUSET等5种模式识别模型进行分选研究。结果表明,随机森林算法(RF)、连续投影算法(SPA)、变量集群分析耦合迭代保留算法(VCPA-IRIV)等3种算法分别选择9个、6个、25个重要的特征频率,其中VCPA-IRIV作为耦合算法选择的特征频率包含的光谱信息最为丰富。为进一步优化模型,对特征频率选择后建模,在分析速度和识别精准度上显著优于全光谱建模方法,经VCPA-IRIV算法筛选的25个特征频率建立的QUEST和KNN分类对是否掺混的鉴别准确率均能达100%。变量集群分析耦合迭代保留算法能够有效地选择包含信息丰富的太赫兹光谱特征频率,能够有效提升所建立的识别模型的准确率。基于太赫兹光谱和耦合特征选择算法建立的掺混稻种识别模型快速、准确,能够为检测劣质掺混粳米种子提供新的方法。

     

    Abstract: Agricultural production safety is an important component of food safety. The Oryza sativa subsp. japonica Kato. as a daily edible rice, rapid inspection of low-quality mixed seeds is an important research work in related fields. In this study, spectral signals of 220 samples of mixed and pure rice varieties were collected using terahertz time-domain spectroscopy, and the spectral data were preprocessed by Fourier transform(FT), and the time-domain signals were converted into frequence-domain signals as modeling data sets. Five pattern recognition models such as QUSET were compared for sorting research. The results show that random forest(RF) algorithm, successive projections algorithm(SPA), variable combination population analysis-iteratively retaining information variables algorithm(VCPA-IRIV) were selected, and the three algorithms selected 9, 6 and 25 important feature frequencies respectively, in which VCPA-IRIV as the characteristic frequency selected by the coupling algorithm contained the most abundant spectral information. In order to further optimize the model, the modeling after characteristic frequency selection was significantly superior to the full-spectrum modeling method in terms of analysis speed and recognition accuracy. The QUEST and KNN classification based on 25 characteristic frequencies screened by the VCPA-IRIV algorithm could both had 100% identification accuracy. The variable cluster analysis coupled iterative retention algorithm could effectively select the characteristic frequency of terahertz spectrum containing rich information, and could effectively improve the accuracy of the established recognition model. The identification model based on terahertz spectrum and coupled feature selection algorithm was fast and accurate, and could be used for detecting poor quality Oryza sativa subsp. japonica Kato. seeds to offer a new approach.

     

/

返回文章
返回