Detailed Information

Cited 53 time in webofscience Cited 55 time in scopus
Metadata Downloads

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy

Authors
Hasan, MM[Hasan, Md Mehedi]Tsukiyama, S[Tsukiyama, Sho]Cho, JY[Cho, Jae Youl]Kurata, H[Kurata, Hiroyuki]Alam, MA[Alam, Md Ashad]Liu, XW[Liu, Xiaowen]Manavalan, B[Manavalan, Balachandran]Deng, HW[Deng, Hong-Wen]
Issue Date
3-Aug-2022
Publisher
CELL PRESS
Keywords
baseline models; bioinformatics; deep learning; epigenetic regulation; machine learning; prediction model; RNA N5-methylcytosine; sequence analysis; stacking framework; systematic evaluation
Citation
MOLECULAR THERAPY, v.30, no.8, pp.2856 - 2867
Indexed
SCIE
SCOPUS
Journal Title
MOLECULAR THERAPY
Volume
30
Number
8
Start Page
2856
End Page
2867
URI
https://scholarx.skku.edu/handle/2021.sw.skku/99282
DOI
10.1016/j.ymthe.2022.05.001
ISSN
1525-0016
Abstract
As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Biotechnology and Bioengineering > Integrative Biotechnology > 1. Journal Articles
OTHERS > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher CHO, JAE YOUL photo

CHO, JAE YOUL
Life Science and Natural Resources (Integrative Biotechnology)
Read more

Altmetrics

Total Views & Downloads

BROWSE