Integrating Gene Ontology Based Grouping and Ranking Into the Machine Learning Algorithm for Gene Expression Data Analysis

Loading...
Publication Logo

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Springer International Publishing AG

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Top 10%
Influence
Average
Popularity
Top 10%

Research Projects

Journal Issue

Abstract

Recent advances in the high throughput technologies resulted in the production of large gene expression data sets for several phenotypes. Via comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc., one could identify biomarkers. As opposed to traditional gene selection approaches, integrative gene selection approaches incorporate domain knowledge from external biological resources during gene selection, which improves interpretability and predictive performance. In this respect, Gene Ontology provides cellular component, molecular function and biological process terms for the products of each gene. In this study, we present Gene Ontology based feature selection approach for gene expression data analysis. In our approach, we used the ontology information as grouping (term) information and embedded this information into a machine learning algorithm for selecting the most significant groups (terms) of ontology. Those groups are used to build the machine learning model in order to perform the classification task. The output of the tool is a significant ontology group for the task of 2-class classification applied on the gene expression data. This knowledge allows the researcher to perform more advanced gene expression analyses. We tested our approach on 8 different gene expression datasets. In our experiments, we observed that the tool successfully found the significant Ontology terms that would be used as a classification model. We believe that our tool will help the geneticists to identify affected genes in transcriptomic data and this information could enable the design of platforms to assist diagnosis, to assess patients' prognoses, and to create patient treatment plans.

Description

Keywords

Fields of Science

Citation

WoS Q

N/A

Scopus Q

Q4
OpenCitations Logo
OpenCitations Citation Count
9

Source

Communications in Computer and Information Science

Volume

1479

Issue

Start Page

205

End Page

214
PlumX Metrics
Citations

CrossRef : 1

Scopus : 16

Captures

Mendeley Readers : 6

SCOPUS™ Citations

17

checked on Mar 04, 2026

Web of Science™ Citations

12

checked on Mar 04, 2026

Page Views

2

checked on Mar 04, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
10.2304

Sustainable Development Goals

SDG data is not available