Abstract:
Specialty fruits have emerged as the essential category in the e-commerce market of agricultural products. A significant trend has demonstrated the brand development in recent years. The brand-building strategies can often be required to steadily enhance the quality of the specialty fruits, in order to fully meet the growing, diversified, and personalized demands of the consumer market. The massive data of consumer reviews can be accumulated over time on e-commerce platforms, profoundly reflecting the consumers' purchasing preferences, consumption psychology, and behavioral patterns. There is a direct correlation among the quality of products, market competitive advantages, and the dynamic evolution of brand influence. Alternatively, multimodal sentiment analysis can be expected to evaluate the field of e-commerce sales. However, the existing challenges still remained in the processing efficiency of the massive multimodal evaluation with the high computational complexity in big data environments; Severe information redundancy is commonly found during multimodal fusion, resulting in the enormous computational resource consumption. The overall performance has been impaired by the importance differences and complementarity of different modalities, such as the text and images in various scenarios of fruit sales evaluation. In this study, a hierarchical dynamic neighborhood sentiment analysis was proposed to effectively alleviate the semantic gap and representation inconsistency among different modalities, in order to overcome these technical bottlenecks. An image-text fusion mechanism was aligned and then constructed to significantly reduce the redundancy of the feature information. A low-rank image-text fusion mechanism was also utilized to substantially improve the computational efficiency. In contrast, the hierarchical dynamic neighborhood fusion mechanism was designed to comprehensively capture the rich contextual information of neighborhood nodes at various levels in a hierarchical structure. A bottom-up iterative optimization was obtained for the deep fusion and collaborative enhancement of the multi-granularity features. In experimental dataset construction, the large-scale evaluation data of specialty fruit sales were systematically collected from two major e-commerce platforms, JD.com and Taobao. A multimodal review dataset was then constructed to cover the multiple categories of specialty fruits, including navel oranges, pomelos, and apples. As such, the multidimensional information was captured after construction, such as the text reviews, product images, and user ratings. The experiment was carried out to fully validate the superior performance. There were the high classification accuracies of 90.76% and 89.45%, and macro-F1 scores of 78.75% and 85.04% on the specialty fruit sales evaluation datasets from JD.com and Taobao e-commerce platforms, respectively, demonstrating the excellent performance; The tasks of the single-modal text classification was performed better in the text feature extraction and semantic understanding; Specifically, the accuracy was improved by 15.56% and 8.62%, respectively, while the macro-F1 values increased by 1.88% and 4.25%, respectively, compared with the Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory (BERT-BiLSTM) baseline model. In addition. The multimodal fusion strategy was verified for its effectiveness and advancement in the tasks of multimodal fusion. The accuracy was improved by 10.27% and 5.14%, respectively, while the macro-F1 values increased by 2.77% and 3.00%, respectively, compared with the Distribution-based Feature Recovery and Fusion (DRF). The computational efficiency was equally outstanding after optimization. The total testing time was reduced by 18.93% and 15.14%, respectively, compared with the Tensor Fusion Network (TFN), fully proving the efficiency and scalability in the practical applications. The robustness and practicality of data evaluation can be expected to solve the class imbalance problem, in order to enrich the theoretical framework of natural language processing. The finding can also provide a feasible technical solution for the multimodal sentiment analysis of fruit e-commerce sales evaluations.