Abstract:
With the rapid development of agricultural informatization, vast amounts of data are being generated, but the data show the characteristics of dispersal and fragmentation. The agricultural knowledge system is vast and complex. Relevant data are usually stored separately in different research institutions and databases. As a result, collecting and integrating the data comprehensively becomes difficult, which leads to data sparsity and missing information in the construction of agricultural knowledge graphs. The neural network-based knowledge graph completion method extracts potential semantic information from the knowledge graph using convolution, pooling, and other operations. In special domain knowledge graph applications, these methods are better adapted to domain-specific data features. However, in complex relational scenarios, existing neural network-based knowledge completion models still struggle to capture long-range dependencies effectively. To address the above issues, this paper took wheat as an example and proposed a two-stage knowledge graph completion model (RASE-ARKGC). In the first stage, large language models (LLMs) were used for rule augmentation (RA). In particular, this module leveraged both the semantic and structural information of the knowledge graph to prompt LLMs to generate logical rules. First, paths were sampled from the knowledge graph to represent structural information. Second, the LLMs-based rule generator mined potential rules based on semantic and structural information. Then, the logical rule sequencer was used to evaluate the quality of the rules and filter out meaningless ones. The rule augmentation module effectively expanded the dataset and alleviated the issue of data sparsity in the knowledge graph. In the second stage, the channel attention mechanism and dilated convolution (SE-ARKGC) were introduced. The channel attention mechanism enhances the model of ability to capture long-range dependencies, while dilated convolution adjusted the dilation rate to obtain multi-scale receptive fields, better capturing the multi-level interactions between entities and relationships. In addition, to further enhance the expressive ability of the model, the residual module was used in the network to effectively mitigate the gradient vanishing problem in the deep network and ensure the complete transfer of feature information in the multilayer network. To validate the effectiveness of the proposed two-stage knowledge graph completion model, experiments were conducted on the self-constructed wheat knowledge graph (WheatSeedBiz). The results showed that RASE-ARKGC achieved 0.482 in MRR and 0.555 in Hits@10. Compared to ConvE, MRR and Hits@10 improved by 9.8% and 10.2%, respectively. In addition, experiments were conducted on the open-source datasets WN18 and FB15k to evaluate the effectiveness and generalization of the SE-ARKGC. The experimental results showed that the model achieved optimal or sub-optimal performance across multiple baseline models. To verify the effectiveness of each improvement module, we conducted ablation experiments to evaluate the impact of the rule augmentation module and the SE-Block based knowledge graph completion module, and the results showed that each module can enhance the model effect. Overall, the RASE-ARKGC model not only effectively expanded the dataset but also achieved the optimal performance in the knowledge completion task. RASE-ARKGC provided an improved approach to knowledge graph completion in the wheat domain while demonstrating generalizability, making it applicable to other domains.