高级检索+

基于改进SVDD算法的池塘水质数据流异常检测

Abnormal detection of aquaculture water quality data stream using an improved SVDD in pond

  • 摘要: 无线传感器网络获取的水质数据流具有高复杂性、非平稳性、非线性等特点,为了提高传感数据流的异常检测能力,保障水质监测数据流的有效性,该研究提出一种基于改进支持向量数据描述(Support Vector Data Description,SVDD)水质数据流异常检测方法。首先应用马氏距离改进Parzen-Window高斯窗函数,避免数据在分类过程中产生干扰。再利用改进的Parzen-Window获取训练数据的分布密度估计,并结合模糊隶属度函数,对传统SVDD算法进行密度补偿,构建改进的SVDD异常检测模型,从而降低有噪正常样本的干扰性,提高算法的分类精度。最后,选择密度补偿支持向量数据描述(Density Weighted Support Vector Data Description,D-SVDD)、传统SVDD和FastFood算法,在不同试验池塘的多个测试数据集中进行对比试验。结果表明,改进 SVDD 算法具有较高的检测性能,该算法在 3 口池塘的最高异常检测正确率 TPR(True Positive Rate)值达到99.83%,最高检测准确率 Accuracy 达到 99.83%,明显优于 D-SVDD 和传统 SVDD 算法,且最低运行时间仅 1.34 s。结果可为水质数据流异常监测提供技术支持。

     

    Abstract: Abstract: An anomaly detection of the data stream has been one of the most critical subjects for the monitoring of water quality in aquaculture. The data stream of water quality collected by wireless sensor network is normally difficult to be detected accurately, due to the characteristics of high complexity, instability, and nonlinearity. The traditional support vector data description (SVDD) presents a relatively low recognition on a small number of abnormal samples under the condition of data imbalance. The noise samples have also a great interference to the anomaly detection, leading to the specific features that cannot be captured completely. In this study, an improved support vector data description (improved SVDD) was proposed to strengthen the detection capability of the sensor data stream. First, a mahalanobis distance was applied to enhance the Gaussian function of Parzen-Window, thus avoiding data interference in the process of classification. Then, the improved Parzen-Window function was utilized to realize the density estimation of training data. As such, the data classification was completed to extract the distribution of training data. In this case, the new ISVDD model was constructed to combine the fuzzy membership function. Thus, the interference of the model from the noise samples was significantly reduced to improve the classification accuracy. Finally, the abnormal detection effect of SVDD different kernel functions was compared to determine the optimal kernel function, according to the performances. The density-weighted support vector data description (D-SVDD), traditional support vector data description (improved SVDD), and the FastFood were selected to verify the performance in different testing datasets of three ponds. The D-SVDD was used to verify the superiority of the fuzzy membership function during improvement operation. The traditional SVDD was used to verify the detection precision of improved SVDD. The FastFood was taken to verify the running efficiency. All detections were tested several times to choose the average values as the final. The true positive rate (TPR), false negative rate (FPR), accuracy value, and running time were used as the detection performance to evaluate all models. The experimental results showed that the improved SVDD presented a higher detection performance. Among them, the maximum TPR value of ISVDD was 99.83%, the minimum FPR value reached zero, the maximum accuracy value of anomaly detection was 99.83%, and the minimum running time was 1.34 s. It indicated that the improved SVDD presented a superior performance than the D-SVDD and traditional SVDD. The detection performance demonstrated that the different kernel functions in SVDD and different detection were identified in all testing ponds during the aquaculture period. In addition, the expanding boundary of normal and abnormal data was achieved using the density-weighted and fuzzy membership function with a greatly better performance of abnormal detection. The finding can provide a new idea to improve the accuracy of anomaly detection in the whole aquaculture cycle. Meanwhile, the experimental and improved SVDD can be expected to serve as a theoretical reference to enhance the supervised level of anomaly detection.

     

/

返回文章
返回