The Effect of Different Classifiers on Recursive Cluster Elimination in the Analysis of Transcriptomic Data

Abstract

Gene expression data with limited sample size and a large number of genes are frequently encountered in genetic studies. In such high-dimensional data, identification of genes that distinguish between disease states is a challenging task. Feature selection (FS) is a useful approach in dealing with high dimensionality. Support Vector Machines Recursive Cluster Elimination (SVM-RCE) is a technique for FS in highdimensional data. The SVM-RCE approach has been utilized for identification of clusters of genes whose expression levels correlate with pathological state. A key step in SVM-RCE is the use of an SVM classifier to assign an area under the curve (AUC) score to each gene cluster based on its ability to predict class labels. In this study, we investigate the use of alternative classifiers in the cluster-scoring step. Specifically, we compare Support Vector Machines, Random Forest, XgBoost, Naive Bayes, and linear logistic regression. In addition to AUC score performance evaluation, the algorithms are compared in terms of the number of selected genes at different levels of clustering and in terms of the running time.

Description

Keywords

Recursive Cluster Elimination, Feature Selection, Clustering, Gene Expression Data Analysis

Turkish CoHE Thesis Center URL

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

1

End Page

5