Abstract:
The joint application of multimodal omics data plays a significant role in revealing cellular heterogeneity and elucidating the mechanisms regulating cell fate. At present, a variety of methods have been developed for the integration of multi-omics modalities. This study conducted performance evaluations on several data integration methods applied to different integration tasks, providing a useful reference for research in related fields. Initially, the performance of 16 single-cell multi-modal paired data integration methods was tested on 6 joint sequencing datasets for 2 integration tasks. Subsequently, the performance of 6 spatial transcriptomic deconvolution methods was assessed using four simulated datasets and one real dataset. For RNA and ATAC paired integration task, MOFA+, SCOIT, and Cobolt each achieved optimal performance on PBMC, BMMC, and SNARE datasets respectively, with SCOIT ranking in the top three in the aggregate scores across all three datasets. MMDVAE and DAE are prominent among the AE-based fusion algorithms. In RNA and protein paired integration task, Cobolt, MOFA+, and Seurat respectively attained optimal performance on P5_CITE, BM_CITE, and COVID datasets, with totalVI ranking prominently in aggregate scores for all three datasets. Among the fusion algorithms based on AE, efMMDVAE and lfMMDVAE perform best. During the evaluation of spatial transcriptomic deconvolution methods, Cell2location and SPACEL outperformed other methods in both simulated and real datasets, with Cell2location demonstrating the best performance in the real dataset by accurately inferring the proportions of two types of cardiomyocytes in the ventricles. Moreover, different methods exhibit varying adaptabilities to data in paired data integration tasks. SCOIT and totalVI respectively emerged as stable and excellent performers in RNA with ATAC and RNA with protein data integrations. Seurat and MOFA+ are sensitive to the influence of data.