Abstract:
Aiming at the problems of unbalanced data categories, single source of features and inability to fully express the semantic and grammatical relationships and program dependencies of smart contracts in the detection of ethereum Ponzi schemes, a detection method was proposed based on undersampling and source code graph representation. This method used the Levenshtein algorithm to calculate the distance among most types of smart contracts in the training set, and then used the K-Means algorithm to cluster most types of smart contracts, and selectively discarded most types of contracts to ensure the distance between fraudulent contracts and normal contracts in the training set. The category was relatively balanced, and the sensitivity of the classifier to abnormal contracts was improved; the composition algorithm was improved for the code features of Ponzi scheme contracts, and the semantic syntax information and program dependencies of smart contracts were deconstructed by removing redundant features and adding new core nodes, which made it easier for the neural network to capture and learn the behavioral characteristics and capital flow patterns of fraudulent contracts. The results of experiments on the XBlock dataset show that the method proposed in this paper has a recall rate of 98% while ensuring precision, which is superior to existing methods.