Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy
- Authors
- Hasan, MM[Hasan, Md Mehedi]; Tsukiyama, S[Tsukiyama, Sho]; Cho, JY[Cho, Jae Youl]; Kurata, H[Kurata, Hiroyuki]; Alam, MA[Alam, Md Ashad]; Liu, XW[Liu, Xiaowen]; Manavalan, B[Manavalan, Balachandran]; Deng, HW[Deng, Hong-Wen]
- Issue Date
- 3-Aug-2022
- Publisher
- CELL PRESS
- Keywords
- baseline models; bioinformatics; deep learning; epigenetic regulation; machine learning; prediction model; RNA N5-methylcytosine; sequence analysis; stacking framework; systematic evaluation
- Citation
- MOLECULAR THERAPY, v.30, no.8, pp.2856 - 2867
- Indexed
- SCIE
SCOPUS
- Journal Title
- MOLECULAR THERAPY
- Volume
- 30
- Number
- 8
- Start Page
- 2856
- End Page
- 2867
- URI
- https://scholarx.skku.edu/handle/2021.sw.skku/99282
- DOI
- 10.1016/j.ymthe.2022.05.001
- ISSN
- 1525-0016
- Abstract
- As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Biotechnology and Bioengineering > Integrative Biotechnology > 1. Journal Articles
- OTHERS > ETC > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.