基于欠采样和源代码图表征的以太坊庞氏骗局检测

龚晓元; 刘冬明; 高峰; 师自通

基于欠采样和源代码图表征的以太坊庞氏骗局检测

Ethereum Ponzi Scheme Detection Based on Undersampling and Source Code Graph Representation

摘要

摘要: 针对目前以太坊庞氏骗局检测中存在的数据类别不平衡，特征来源单一和不能完整表达智能合约的语义语法关系和程序依赖关系等相关问题，提出了一种基于欠采样和源代码图表征的以太坊庞氏骗局检测方法。该方法利用Levenshtein算法计算训练集中多数类智能合约之间的距离，进而通过K-Means算法对多数类智能合约进行聚类，有选择地丢弃多数类合约，保证了训练集骗局合约和正常合约的类别相对平衡，提高了分类器对异常合约的敏感程度；针对庞氏骗局合约的代码特征改进了构图算法，通过剔除冗余特征，添加新的核心节点来解构智能合约的语义语法信息和程序依赖关系，使得神经网络更容易捕获和学习骗局合约的行为特征和资金流向模式。在XBlock数据集上进行了实验，结果表明本文提出的方法在保证精度的同时，召回率达到98%,优于现有方法。

Abstract: Aiming at the problems of unbalanced data categories, single source of features and inability to fully express the semantic and grammatical relationships and program dependencies of smart contracts in the detection of ethereum Ponzi schemes, a detection method was proposed based on undersampling and source code graph representation. This method used the Levenshtein algorithm to calculate the distance among most types of smart contracts in the training set, and then used the K-Means algorithm to cluster most types of smart contracts, and selectively discarded most types of contracts to ensure the distance between fraudulent contracts and normal contracts in the training set. The category was relatively balanced, and the sensitivity of the classifier to abnormal contracts was improved; the composition algorithm was improved for the code features of Ponzi scheme contracts, and the semantic syntax information and program dependencies of smart contracts were deconstructed by removing redundant features and adding new core nodes, which made it easier for the neural network to capture and learn the behavioral characteristics and capital flow patterns of fraudulent contracts. The results of experiments on the XBlock dataset show that the method proposed in this paper has a recall rate of 98% while ensuring precision, which is superior to existing methods.

HTML全文

参考文献(17)

施引文献

资源附件(0)