NeRNA: A negative data generation framework for machine learning applications of noncoding RNAs

dc.contributor.author Orhan, Mehmet Emin
dc.contributor.author Demirci, Yilmaz Mehmet
dc.contributor.author Sacar Demirci, Muserref Duygu
dc.contributor.authorID 0000-0002-1757-1374 en_US
dc.contributor.authorID 0000-0003-3802-4211 en_US
dc.contributor.authorID 0000-0003-2012-0598 en_US
dc.contributor.department AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Biyomühendislik Bölümü en_US
dc.contributor.institutionauthor Orhan, Mehmet Emin
dc.contributor.institutionauthor Demirci, Yilmaz Mehmet
dc.contributor.institutionauthor Sacar Demirci, Muserref Duygu
dc.date.accessioned 2023-07-18T07:01:39Z
dc.date.available 2023-07-18T07:01:39Z
dc.date.issued 2023 en_US
dc.description.abstract Many supervised machine learning based noncoding RNA (ncRNA) analysis methods have been developed to classify and identify novel sequences. During such analysis, the positive learning datasets usually consist of known examples of ncRNAs and some of them might even have weak or strong experimental validation. On the contrary, there are neither databases listing the confirmed negative sequences for a specific ncRNA class nor standardized methodologies developed to generate high quality negative examples. To overcome this challenge, a novel negative data generation method, NeRNA (negative RNA), is developed in this work. NeRNA uses known examples of given ncRNA sequences and their calculated structures for octal representation to create negative sequences in a manner similar to frameshift mutations but without deletion or insertion. NeRNA is tested individually with four different ncRNA datasets including microRNA (miRNA), transfer RNA (tRNA), long noncoding RNA (lncRNA), and circular RNA (circRNA). Furthermore, a species-specific case analysis is performed to demonstrate and compare the performance of NeRNA for miRNA prediction. The results of 1000 fold cross-validation on Decision Tree, Naïve Bayes and Random Forest classifiers, and deep learning algorithms such as Multilayer Perceptron, Convolutional Neural Network, and Simple feedforward Neural Networks indicate that models obtained by using NeRNA generated datasets, achieves substantially high prediction performance. NeRNA is released as an easy-to-use, updatable and modifiable KNIME workflow that can be downloaded with example datasets and required extensions. In particular, NeRNA is designed to be a powerful tool for RNA sequence data analysis. en_US
dc.identifier.endpage 8 en_US
dc.identifier.issn 0010-4825
dc.identifier.issn 1879-0534
dc.identifier.other WOS:000988971900001
dc.identifier.startpage 1 en_US
dc.identifier.uri https://doi.org/10.1016/j.compbiomed.2023.106861
dc.identifier.uri https://hdl.handle.net/20.500.12573/1635
dc.identifier.volume 159 en_US
dc.language.iso eng en_US
dc.publisher PERGAMON-ELSEVIER SCIENCE en_US
dc.relation.isversionof 10.1016/j.compbiomed.2023.106861 en_US
dc.relation.journal COMPUTERS IN BIOLOGY AND MEDICINE en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject RNA en_US
dc.subject Noncoding RNA en_US
dc.subject Data generation en_US
dc.subject Machine learning en_US
dc.title NeRNA: A negative data generation framework for machine learning applications of noncoding RNAs en_US
dc.type article en_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
1-s2.0-S0010482523003268-main.pdf
Size:
6 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: