iPSI(2L)-EDL: a Two-layer Predictor for Identifying Promoters and their Types based on Ensemble Deep Learning


如何引用文章

全文:

详细

Promoters are DNA fragments located near the transcription initiation site, they can be divided into strong promoter type and weak promoter type according to transcriptional activation and expression level. Identifying promoters and their strengths in DNA sequences is essential for understanding gene expression regulation. Therefore, it is crucial to further improve predictive quality of predictors for real-world application requirements. Here, we constructed the latest training dataset based on the RegalonDB website, where all the promoters in this dataset have been experimentally validated, and their sequence similarity is less than 85%. We used one-hot and nucleotide chemical property and density (NCPD) to represent DNA sequence samples. Additionally, we proposed an ensemble deep learning framework containing a multi-head attention module, long short-term memory present, and a convolutional neural network module.

:The results showed that iPSI(2L)-EDL outperformed other existing methods for both promoter prediction and identification of strong promoter type and weak promoter type, the AUC and MCC for the iPSI(2L)-EDL in identifying promoter were improved by 2.23% and 2.96% compared to that of PseDNC-DL on independent testing data, respectively, while the AUC and MCC for the iPSI(2L)- EDL were increased by 3.74% and 5.86% in predicting promoter strength type, respectively. The results of ablation experiments indicate that CNN plays a crucial role in recognizing promoters, the importance of different input positions and long-range dependency relationships among features are helpful for recognizing promoters.

:Furthermore, to make it easier for most experimental scientists to get the results they need, a userfriendly web server has been established and can be accessed at http://47.94.248.117/IPSW(2L)-EDL.

作者简介

Xuan Xiao

Department of Computer, Jing-De-Zhen Ceramic Universit

编辑信件的主要联系方式.
Email: info@benthamscience.net

Zaihao Hu

Department of Computer, Jing-De-Zhen Ceramic University

Email: info@benthamscience.net

ZhenTao Luo

Department of Computer, Jing-De-Zhen Ceramic University

Email: info@benthamscience.net

Zhaochun Xu

Department of Computer, Jing-De-Zhen Ceramic University

Email: info@benthamscience.net

参考

  1. Shahmuradov IA, Mohamad Razali R, Bougouffa S, Radovanovic A, Bajic VB. bTSSfinder: A novel tool for the prediction of promoters in cyanobacteria and Escherichia coli. Bioinformatics 2017; 33(3): 334-40. doi: 10.1093/bioinformatics/btw629 PMID: 27694198
  2. Vo ngoc L, Wang YL, Kassavetis GA, Kadonaga JT. The punctilious RNA polymerase II core promoter. Genes Dev 2017; 31(13): 1289-301. doi: 10.1101/gad.303149.117 PMID: 28808065
  3. Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res 2005; 33(20): 6494-506. doi: 10.1093/nar/gki937 PMID: 16314312
  4. Carter R, Drouin G. Structural differentiation of the three eukaryotic RNA polymerases. Genomics 2009; 94(6): 388-96. doi: 10.1016/j.ygeno.2009.08.011 PMID: 19720141
  5. Trapnell C, Pachter L, Salzberg SL. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics 2009; 25(9): 1105-11. doi: 10.1093/bioinformatics/btp120 PMID: 19289445
  6. Furey TS. ChIP–seq and beyond: New and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet 2012; 13(12): 840-52. doi: 10.1038/nrg3306 PMID: 23090257
  7. Lin H, Deng EZ, Ding H, Chen W, Chou KC. iPro54-PseKNC: A sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 2014; 42(21): 12961-72. doi: 10.1093/nar/gku1019 PMID: 25361964
  8. He W, Jia C, Duan Y, Zou Q. 70ProPred: A predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst Biol 2018; 12(S4): 44. doi: 10.1186/s12918-018-0570-1 PMID: 29745856
  9. Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: A two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 2018; 34(1): 33-40. doi: 10.1093/bioinformatics/btx579 PMID: 28968797
  10. Lyu Y, He W, Li S, et al. iPro2L-PSTKNC: A two-layer predictor for discovering various types of promoters by position specific of nucleotide composition. IEEE J Biomed Health Inform 2021; 25(6): 2329-37. PMID: 32976109
  11. Liu B, Li K. iPromoter-2L2. 0: Identifying promoters and their types by combining smoothing cutting window algorithm and sequence-based features. Mol Ther Nucleic Acids 2019; 18: 80-7. doi: 10.1016/j.omtn.2019.08.008 PMID: 31536883
  12. Ernst J, Kellis M. ChromHMM: Automating chromatin-state discovery and characterization. Nat Methods 2012; 9(3): 215-6. doi: 10.1038/nmeth.1906 PMID: 22373907
  13. Chan RCW, Libbrecht MW, Roberts EG, Bilmes JA, Noble WS, Hoffman MM. Segway 2.0: Gaussian mixture models and minibatch training. Bioinformatics 2018; 34(4): 669-71. doi: 10.1093/bioinformatics/btx603 PMID: 29028889
  14. Amin R, Rahman CR, Ahmed S, et al. iPromoter-BnCNN: A novel branched CNN-based predictor for identifying and classifying sigma promoters. Bioinformatics 2020; 36(19): 4869-75. doi: 10.1093/bioinformatics/btaa609 PMID: 32614400
  15. Yang B, Liu F, Ren C, et al. BiRen: Predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 2017; 33(13): 1930-6. doi: 10.1093/bioinformatics/btx105 PMID: 28334114
  16. Tahir M, Tayara H, Chong KT. iRNA-PseKNC(2methyl): Identify RNA 2′-O-methylation sites by convolution neural network and Chou’s pseudo components. J Theor Biol 2019; 465: 1-6. doi: 10.1016/j.jtbi.2018.12.034 PMID: 30590059
  17. Umarov RK, Solovyev VV. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS One 2017; 12(2): e0171410. doi: 10.1371/journal.pone.0171410 PMID: 28158264
  18. Le NQK, Yapp EKY, Nagasundaram N, Yeh HY. Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous fasttext N-grams. Front Bioeng Biotechnol 2019; 7(305): 305. doi: 10.3389/fbioe.2019.00305 PMID: 31750297
  19. Zhu Y, Li F, Xiang D, Akutsu T, Song J, Jia C. Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks. Brief Bioinform 2021; 22(4): bbaa299. doi: 10.1093/bib/bbaa299 PMID: 33227813
  20. Ma ZW, Zhao JP, Tian J, Zheng CH. DeeProPre: A promoter predictor based on deep learning. Comput Biol Chem 2022; 101: 107770. doi: 10.1016/j.compbiolchem.2022.107770 PMID: 36116322
  21. Nguyen-Vo TH, Trinh QH, Nguyen L, Nguyen-Hoang PU, Rahardja S, Nguyen BP. iPromoter-Seqvec: Identifying promoters using bidirectional long short-term memory and sequence-embedded features. BMC Genomics 2022; 23(S5): 681. doi: 10.1186/s12864-022-08829-6 PMID: 36192696
  22. Xiao X, Xu ZC, Qiu WR, Wang P, Ge HT, Chou KC. iPSW(2L)-PseKNC: A two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition. Genomics 2019; 111(6): 1785-93. doi: 10.1016/j.ygeno.2018.12.001 PMID: 30529532
  23. Liang Y, Zhang S, Qiao H, Yao Y. iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection. Anal Biochem 2021; 630: 114335. doi: 10.1016/j.ab.2021.114335 PMID: 34389299
  24. Tayara H, Tahir M, Chong KT. Identification of prokaryotic promoters and their strength by integrating heterogeneous features. Genomics 2020; 112(2): 1396-403. doi: 10.1016/j.ygeno.2019.08.009 PMID: 31437540
  25. Le NQK, Ho QT, Nguyen VN, Chang JS. BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection. Comput Biol Chem 2022; 99: 107732. doi: 10.1016/j.compbiolchem.2022.107732 PMID: 35863177
  26. Tierrafría VH, Rioualen C, Salgado H, et al. RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12. Microb Genom 2022; 8(5): 000833. doi: 10.1099/mgen.0.000833 PMID: 35584008
  27. Shepelev V, Fedorov A. Advances in the Exon-Intron Database (EID). Brief Bioinform 2006; 7(2): 178-85. doi: 10.1093/bib/bbl003 PMID: 16772261
  28. Le NQK, Yapp EKY, Ho QT, Nagasundaram N, Ou YY, Yeh HY. iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Anal Biochem 2019; 571: 53-61. doi: 10.1016/j.ab.2019.02.017 PMID: 30822398
  29. Rahman MS, Aktar U, Jani MR, Shatabda S. iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics 2019; 111(5): 1160-6. doi: 10.1016/j.ygeno.2018.07.011 PMID: 30059731
  30. Li H, Shi L, Gao W, et al. dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost. Methods 2022; 204: 215-22. doi: 10.1016/j.ymeth.2022.01.001 PMID: 34998983
  31. Wang M, Li F, Wu H, Liu Q, Li S. PredPromoter-MF(2L): A novel approach of promoter prediction based on multi-source feature fusion and deep forest. Interdiscip Sci 2022; 14(3): 697-711. doi: 10.1007/s12539-022-00520-4 PMID: 35488998
  32. Bhukya R, Kumari A, Amilpur S, Dasari CM. PPred-PCKSM: A multi-layer predictor for identifying promoter and its variants using position based features. Comput Biol Chem 2022; 97: 107623. doi: 10.1016/j.compbiolchem.2022.107623 PMID: 35065417
  33. Kim J, Shujaat M, Tayara H. iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network. Genomics 2022; 114(3): 110384. doi: 10.1016/j.ygeno.2022.110384 PMID: 35533969
  34. Tahir M, Hayat M, Gul S, Chong KT. An intelligent computational model for prediction of promoters and their strength via natural language processing. Chemom Intell Lab Syst 2020; 202: 104034. doi: 10.1016/j.chemolab.2020.104034
  35. Shariati FS, Keramati M, Valizadeh V, Cohan RA, Norouzian D. Comparison of E. coli based self-inducible expression systems containing different human heat shock proteins. Sci Rep 2021; 11(1): 4576. doi: 10.1038/s41598-021-84188-8 PMID: 33633341
  36. Arsène F, Tomoyasu T, Bukau B. The heat shock response of escherichia coli. Int J Food Microbiol 2000; 55(1-3): 3-9. doi: 10.1016/S0168-1605(00)00206-3 PMID: 10791710
  37. Lalwani MA, Ip SS, Carrasco-López C, et al. Optogenetic control of the lac operon for bacterial chemical and protein production. Nat Chem Biol 2021; 17(1): 71-9. doi: 10.1038/s41589-020-0639-1 PMID: 32895498
  38. Greenfield L, Boone T, Wilcox G. DNA sequence of the araBAD promoter in escherichia coli B/r. Proc Natl Acad Sci 1978; 75(10): 4724-8. doi: 10.1073/pnas.75.10.4724 PMID: 368797

补充文件

附件文件
动作
1. JATS XML

版权所有 © Bentham Science Publishers, 2024