NeRNA: A Negative Data Generation Framework for Machine Learning Applications of Noncoding RNAs

Loading...
Publication Logo

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Pergamon-Elsevier Science Ltd

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

Many supervised machine learning based noncoding RNA (ncRNA) analysis methods have been developed to classify and identify novel sequences. During such analysis, the positive learning datasets usually consist of known examples of ncRNAs and some of them might even have weak or strong experimental validation. On the contrary, there are neither databases listing the confirmed negative sequences for a specific ncRNA class nor standardized methodologies developed to generate high quality negative examples. To overcome this challenge, a novel negative data generation method, NeRNA (negative RNA), is developed in this work. NeRNA uses known examples of given ncRNA sequences and their calculated structures for octal representation to create negative sequences in a manner similar to frameshift mutations but without deletion or insertion. NeRNA is tested individually with four different ncRNA datasets including MicroRNA (miRNA), transfer RNA (tRNA), long noncoding RNA (lncRNA), and circular RNA (circRNA). Furthermore, a species-specific case analysis is per-formed to demonstrate and compare the performance of NeRNA for miRNA prediction. The results of 1000 fold cross-validation on Decision Tree, Naive Bayes and Random Forest classifiers, and deep learning algorithms such as Multilayer Perceptron, Convolutional Neural Network, and Simple feedforward Neural Networks indicate that models obtained by using NeRNA generated datasets, achieves substantially high prediction performance. NeRNA is released as an easy-to-use, updatable and modifiable KNIME workflow that can be downloaded with example datasets and required extensions. In particular, NeRNA is designed to be a powerful tool for RNA sequence data analysis.

Description

Orhan, Mehmet Emin/0000-0002-1757-1374; Sacar Demirci, Muserref Duygu/0000-0003-2012-0598

Keywords

Rna, Noncoding Rna, Data Generation, Machine Learning, Machine Learning, MicroRNAs, RNA, Untranslated, Bayes Theorem, RNA, Long Noncoding, RNA, Circular, Algorithms

Fields of Science

Citation

WoS Q

Q1

Scopus Q

Q1
OpenCitations Logo
OpenCitations Citation Count
2

Source

Computers in Biology and Medicine

Volume

159

Issue

Start Page

End Page

PlumX Metrics
Citations

CrossRef : 2

Scopus : 2

Captures

Mendeley Readers : 11

SCOPUS™ Citations

2

checked on Mar 06, 2026

Web of Science™ Citations

2

checked on Mar 06, 2026

Page Views

1

checked on Mar 06, 2026

Downloads

5

checked on Mar 06, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.4953

Sustainable Development Goals