NeRNA: A Negative Data Generation Framework for Machine Learning Applications of Noncoding RNAs

dc.contributor.author Orhan, Mehmet Emin
dc.contributor.author Demirci, Yilmaz Mehmet
dc.contributor.author Demirci, Mueserref Duygu Sacar
dc.contributor.author Saçar Demirci, Müşerref Duygu
dc.date.accessioned 2025-09-25T10:53:03Z
dc.date.available 2025-09-25T10:53:03Z
dc.date.issued 2023
dc.description Orhan, Mehmet Emin/0000-0002-1757-1374; Sacar Demirci, Muserref Duygu/0000-0003-2012-0598 en_US
dc.description.abstract Many supervised machine learning based noncoding RNA (ncRNA) analysis methods have been developed to classify and identify novel sequences. During such analysis, the positive learning datasets usually consist of known examples of ncRNAs and some of them might even have weak or strong experimental validation. On the contrary, there are neither databases listing the confirmed negative sequences for a specific ncRNA class nor standardized methodologies developed to generate high quality negative examples. To overcome this challenge, a novel negative data generation method, NeRNA (negative RNA), is developed in this work. NeRNA uses known examples of given ncRNA sequences and their calculated structures for octal representation to create negative sequences in a manner similar to frameshift mutations but without deletion or insertion. NeRNA is tested individually with four different ncRNA datasets including MicroRNA (miRNA), transfer RNA (tRNA), long noncoding RNA (lncRNA), and circular RNA (circRNA). Furthermore, a species-specific case analysis is per-formed to demonstrate and compare the performance of NeRNA for miRNA prediction. The results of 1000 fold cross-validation on Decision Tree, Naive Bayes and Random Forest classifiers, and deep learning algorithms such as Multilayer Perceptron, Convolutional Neural Network, and Simple feedforward Neural Networks indicate that models obtained by using NeRNA generated datasets, achieves substantially high prediction performance. NeRNA is released as an easy-to-use, updatable and modifiable KNIME workflow that can be downloaded with example datasets and required extensions. In particular, NeRNA is designed to be a powerful tool for RNA sequence data analysis. en_US
dc.identifier.doi 10.1016/j.compbiomed.2023.106861
dc.identifier.issn 0010-4825
dc.identifier.issn 1879-0534
dc.identifier.scopus 2-s2.0-85152486090
dc.identifier.uri https://doi.org/10.1016/j.compbiomed.2023.106861
dc.identifier.uri https://hdl.handle.net/20.500.12573/4266
dc.language.iso en en_US
dc.publisher Pergamon-Elsevier Science Ltd en_US
dc.relation.ispartof Computers in Biology and Medicine en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Rna en_US
dc.subject Noncoding Rna en_US
dc.subject Data Generation en_US
dc.subject Machine Learning en_US
dc.title NeRNA: A Negative Data Generation Framework for Machine Learning Applications of Noncoding RNAs en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Orhan, Mehmet Emin/0000-0002-1757-1374
gdc.author.id Sacar Demirci, Muserref Duygu/0000-0003-2012-0598
gdc.author.scopusid 58071029100
gdc.author.scopusid 36674901500
gdc.author.scopusid 55735789200
gdc.author.wosid Demirci, Müşerref/N-7458-2017
gdc.author.wosid Sacar Demirci, Muserref Duygu/N-7458-2017
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Orhan, Mehmet Emin] Abdullah Gul Univ, Grad Sch Engn & Sci, Dept Bioengn, Kayseri, Turkiye; [Demirci, Yilmaz Mehmet] Abdullah Gul Univ, Fac Engn, Dept Engn Sci, Kayseri, Turkiye; [Demirci, Mueserref Duygu Sacar] Abdullah Gul Univ, Fac Life & Nat Sci, Dept Bioengn, Kayseri, Turkiye en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.volume 159 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q1
gdc.identifier.openalex W4364367854
gdc.identifier.pmid 37075604
gdc.identifier.wos WOS:000988971900001
gdc.index.type WoS
gdc.index.type Scopus
gdc.index.type PubMed
gdc.oaire.diamondjournal false
gdc.oaire.impulse 2.0
gdc.oaire.influence 2.5137836E-9
gdc.oaire.isgreen false
gdc.oaire.keywords Machine Learning
gdc.oaire.keywords MicroRNAs
gdc.oaire.keywords RNA, Untranslated
gdc.oaire.keywords Bayes Theorem
gdc.oaire.keywords RNA, Long Noncoding
gdc.oaire.keywords RNA, Circular
gdc.oaire.keywords Algorithms
gdc.oaire.popularity 3.5316379E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration National
gdc.openalex.fwci 0.4953
gdc.openalex.normalizedpercentile 0.72
gdc.opencitations.count 2
gdc.plumx.crossrefcites 2
gdc.plumx.mendeley 11
gdc.plumx.scopuscites 2
gdc.scopus.citedcount 2
gdc.virtual.author Demirci, Yılmaz Mehmet
gdc.virtual.author Saçar Demirci, Müşerref Duygu
gdc.wos.citedcount 2
relation.isAuthorOfPublication 4c089860-8459-445d-90cc-c13394882f01
relation.isAuthorOfPublication 99fd1cc2-69da-4eaa-a58d-7425d0459b6f
relation.isAuthorOfPublication.latestForDiscovery 4c089860-8459-445d-90cc-c13394882f01
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 26c938e5-738e-41bf-8231-8de593870236
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication 4eea69bf-e8aa-4e3e-ab18-7587ac1d841b
relation.isOrgUnitOfPublication 5519c95e-5bcb-45e5-8ce1-a8b4bcf7c7b9
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1-s2.0-S0010482523003268-main.pdf
Size:
6 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: