高级检索+

基于Transformer编码器和Nanopore数据的DNA 5-甲基胞嘧啶位点预测

曾佳, 陈玲玲

曾佳, 陈玲玲. 基于Transformer编码器和Nanopore数据的DNA 5-甲基胞嘧啶位点预测[J]. 基因组学与应用生物学, 2023, 42(12): 1344-1352. DOI: 10.13417/j.gab.042.001344
引用本文: 曾佳, 陈玲玲. 基于Transformer编码器和Nanopore数据的DNA 5-甲基胞嘧啶位点预测[J]. 基因组学与应用生物学, 2023, 42(12): 1344-1352. DOI: 10.13417/j.gab.042.001344
CENG Jia, CHEN Ling-ling. DNA 5-methylcytosine Site Prediction Based on Transformer Encoder and Nanopore Data[J]. Genomics and Applied Biology, 2023, 42(12): 1344-1352. DOI: 10.13417/j.gab.042.001344
Citation: CENG Jia, CHEN Ling-ling. DNA 5-methylcytosine Site Prediction Based on Transformer Encoder and Nanopore Data[J]. Genomics and Applied Biology, 2023, 42(12): 1344-1352. DOI: 10.13417/j.gab.042.001344

基于Transformer编码器和Nanopore数据的DNA 5-甲基胞嘧啶位点预测

基金项目: 

国家自然科学基金项目(32270712)资助

详细信息
    通讯作者:

    陈玲玲,llchen@gxu.edu.cn

  • 中图分类号: Q811.4

DNA 5-methylcytosine Site Prediction Based on Transformer Encoder and Nanopore Data

  • 摘要: DNA中的5-甲基胞嘧啶(5-methylcytosine,5mC)是通过DNA甲基转移酶在胞嘧啶环第5个碳原子上共价结合一个甲基产生的,广泛存在于不同组织中,在各种生物过程中发挥着重要作用。通过甲基化位点对相应的甲基化修饰进行研究是一种常用手段,因此,5mC位点的准确鉴定对深入理解其生物学功能至关重要。随着人工智能的飞速发展,深度学习已经成为了生物信息学的重要分析工具,越来越多的生物学问题通过深度学习得到解决。Transformer是一种基于注意力机制的深度学习模型,本研究基于第三代基因测序技术Nanopore测序数据进行特征提取,通过Transformer编码器对特征进行编码,最后输入到双向长短期记忆网络(long short-term memory, LSTM)中以预测5mC位点。使用拟南芥(Arabidopsis thaliana)和水稻(Oryza sativa)对模型进行训练和测试,结果表明,本模型能够有效提取5mC位点的潜在特征,从而提高5mC位点的预测能力。
    Abstract: 5-methylcytosine(5mC) in DNA is produced by covalently binding a methyl group on the fifth carbon atom of the cytosine ring by DNA methyltransferase, and it is widely present in different tissues, and plays an important role in various biological processes. It is a common method to study the corresponding methylation modification through the methylation site. Therefore, the accurate identification of the 5mC site is crucial for a deep understanding of its biological function. With the rapid development of artificial intelligence, deep learning has become an important analytical tool in bioinformatics, and more and more biological problems have been solved through deep learning. Transformer is a deep learning model based on the attention mechanism. In this paper, features are extracted based on the third-generation gene sequencing technology Nanopore sequencing data, and then the features are encoded by the Transformer encoder, and finally input into the bidirectional long short-term memory(LSTM) to predict the 5mC site. We trained and tested the model using Nanopore sequencing data from Arabidopsis thaliana and Oryza sativa, and the results showed that our model can effectively extract the latent features of 5mC site, thereby improving the predictive power of 5mC site.
  • [1] 孙颖,葛锋,刘迪秋,等,2011.植物中DNA甲基化模式及其相关机制.植物生理学报,47(8):745-751.[SUN Y,GE F,LIU D Q,et al.,2011.DNA methylation patterns and its related mechanism in plants.Plant Physiology Journal,47(8):745-751.]
    [2] 袁超,张少伟,牛义,等,2020.植物DNA甲基化作用机制的研究进展.生物工程学报,36(5):838-848.[YUAN C,ZHANG S W,NIU Y,et al.,2020.Advances in research on the mechanism of DNA methylation in plants.Chinese Journal of Biotechnology,36(5):838-848.]
    [3]

    DAVIS B M,CHAO M C,WALDOR M K,2013.Entering the era of bacterial epigenomics with single molecule real time DNA sequencing.Curr.Opin.Microbiol.,16(2):192-198.

    [4]

    DOMB K,KATZ A,HARRIS K D,et al.,2020.DNA methylation mutants in Physcomitrella patens elucidate individual roles of CG and non-CG methylation in genome regulation.Proc.Natl.Acad.Sci.USA,117(52):33700-33710.

    [5]

    HOCHREITER S,SCHMIDHUBER J,1997.Long short-term memory.Neural Comput.,9(8) :1735-1780.

    [6]

    KINGMA D P,BA J,2014.Adam:a method for stochastic optimization.arXiv:1412.6980.

    [7]

    KRUEGER F,ANDREWS S R,2011.Bismark:a flexible alig-ner and methylation caller for Bisulfite-Seq applications.Bioinformatics,27(11):1571-1572.

    [8]

    LASZLO A H,DERRINGTON I M,BRINKERHOFF H,et al.,2013.Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA.Proc.Natl.Acad.Sci.U.S.A.,110(47):18904-18909.

    [9]

    LIU Q,FANG L,YU G L,et al.,2019a.Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data.Nat.Commun.,10(1):2449.

    [10]

    LIU Q,GEORGIEVA D C,EGLI D,et al.,2019b.NanoMod:a computational tool to detect DNA modifications using Nanopore long-read sequencing data.BMC Genom.,20(1):31-42.

    [11]

    MIURA F,ENOMOTO Y,DAIRIKI R,et al.,2012.Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging.Nucleic Acids Res.,40(17):e136.

    [12]

    NI P,HUANG N,NIE F,et al.,2021.Genome-wide detection of cytosine methylations in plant from Nanopore data using deep learning.Nat.Commun.,12:5976.

    [13]

    RAND A C,JAIN M,EIZENGA J M,et al.,2017.Mapping DNA methylation with high-throughput nanopore sequencing.Nat.Meth.,14(4):411-413.

    [14]

    SCHATZ M C,2017.Nanopore sequencing meets epigenetics.Nat.Meth.,14(4):347-348.

    [15]

    SCHREIBER J,WESCOE Z L,ABU-SHUMAYS R,et al.,2013.Error rates for nanopore discrimination among cytosine,methylcytosine,and hydroxymethylcytosine along individual DNA strands.Proc.Natl.Acad.Sci.U.S.A.,110(47):18910-18915.

    [16]

    SIMPSON J T,WORKMAN R E,ZUZARTE P C,et al.,2017.Detecting DNA cytosine methylation using nanopore sequencing.Nat.Meth.,14(4):407-410.

    [17]

    SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.,2014.Dropout:a simple way to prevent neural networks from overfitting.J.Mach.Learn.Res.,15:1929-1958.

    [18]

    STOIBER M,QUICK J,EGAN R,et al.,2017.De novo identification of DNA modifications enabled by genome-guided nanopore signal processing.bioRxiv,DOI: 10.1101/094672.

    [19]

    VASWANI A,SHAZEER N,PARMAR N,et al.,2017.Attention is all you need// GUYON I,VON LUXBURG U,BENGIO S,et al.,Advances in Neural Information Processing Systems 30 (NIPS 2017) .Long Beach:Neural Information Processing Systems Foundation,Inc.(NeurIPS):5998-6008.

    [20]

    WEI Y T,YANG F,WAINWRIGHT M J,2019.Early stopping for kernel boosting algorithms:a general analysis with loca-lized complexities.IEEE Trans.Inform.Theory,65(10):6685-6703.

    [21]

    XU L,SEKI M,2020.Recent advances in the detection of base modifications using the Nanopore sequencer.J.Hum.Genet.,65(1):25-33.

    [22]

    ZHANG H M,LANG Z B,ZHU J K,2018.Dynamics and function of DNA methylation in plants.Nat.Rev.Mol.Cell Biol.,19(8):489-506.

计量
  • 文章访问数:  2
  • HTML全文浏览量:  1
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 刊出日期:  2023-12-24

目录

    /

    返回文章
    返回