Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 8 of 8
  • Article
    Citation - Scopus: 4
    RCE-IFE: Recursive Cluster Elimination with Intra-Cluster Feature Elimination
    (PeerJ Inc., 2025-02-07) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Qaqish, Bahjat; Yousef, Malik
  • Conference Object
    Impact of Gene Duplicate Handling Strategies on Classification Performance and Feature Selection in Gene Expression Data
    (Institute of Electrical and Electronics Engineers Inc., 2025-09-17) Kuzudisli, Cihan; Qaqish, Bahjat; Gungor, Burcu Bakir; Yousef, Malik
  • Article
    Citation - WoS: 48
    Citation - Scopus: 65
    Review of Feature Selection Approaches Based on Grouping of Features
    (PeerJ Inc, 2023-07-17) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Bulut, Nurten; Qaqish, Bahjat; Yousef, Malik
    With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly -ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
  • Article
    Citation - WoS: 2
    Citation - Scopus: 4
    RCE-IFE: Recursive Cluster Elimination With Intra-Cluster Feature Elimination
    (PeerJ Inc, 2025-02-07) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Qaqish, Bahjat; Yousef, Malik
    The computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniques that seek to detect strong correlations among features and identify irrelevant features. In this work, we propose the Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE) method that utilizes feature grouping and iterates grouping and elimination steps in a supervised context. We assess dimensionality reduction and discriminatory capabilities of RCE-IFE on various high-dimensional datasets from different biological domains. For a set of gene expression, MicroRNA (miRNA) expression, and methylation datasets, the performance of RCE-IFE is comparatively evaluated with RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE. On average, RCE-IFE attains an area under the curve (AUC) of 0.85 among tested expression datasets with the fewest features and the shortest running time, while RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE achieve similar AUCs of 0.84 and 0.83, respectively. RCE-IFE and SVM-RCE yield AUCs of 0.79 and 0.68, respectively when averaged over seven different metagenomics datasets, with RCE-IFE significantly reducing feature subsets. Furthermore, RCE-IFE surpasses several state-of-the-art FS methods, such as Minimum Redundancy Maximum Relevance (MRMR), Fast Correlation-Based Filter (FCBF), Information Gain (IG), Conditional Mutual Information Maximization (CMIM), SelectKBest (SKB), and eXtreme Gradient Boosting (XGBoost), obtaining an average AUC of 0.76 on five gene expression datasets. Compared with a similar tool, Multi-stage, RCE-IFE gives a similar average accuracy rate of 89.27% using fewer features on four cancer-related datasets. The comparability of RCE-IFE is also verified with other biological domain knowledge-based Grouping-Scoring-Modeling (G-S-M) tools, including mirGediNET, 3Mint, and miRcorrNet. Additionally, the biological relevance of the selected features by RCE-IFE is evaluated. The proposed method also exhibits high consistency in terms of the selected features across multiple runs. Our experimental findings imply that RCE-IFE provides robust classifier performance and significantly reduces feature size while maintaining feature relevance and consistency.
  • Conference Object
    Metagenomic Data Analysis With Machine Learning to Discover Colorectal Cancer-Associated Enzymes
    (IEEE, 2024-05-15) Ersoz, Nur Sebnem; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu
    The human gut microbiome comprises over 10 trillion microbes and plays important roles in maintaining metabolism, body homeostasis, impacting immune function. Metagenomics which studies genomic data from clinical and environmental samples is crucial in understanding the interplay between the host and the gut microbiome. Recently, functional profiling of metagenomes helps to identify alterations in microbial functions, particularly enzyme-encoding genes. Colorectal cancer (CRC) is known as one of the leading causes of cancer-related deaths. In this study, we aimed to find the CRC-associated enzymes by analyzing metagenomic data with different machine learning methods. A total of 1262 samples including CRC and control groups from different countries were used in this study. This dataset was obtained by functionally profiling metagenomics data and estimating community level enzyme commission (EC) abundance values. For the analysis of this dataset, RCE-IFE and SVM-RCE machine learning methods, which are group-based feature selection methods, were compared with 6 different individual feature selection methods. 10 times Monte-Carlo Cross Validation was used in our experiments. It was observed that RCE-IFE, Extreme Gradient Boosting and Select K Best methods similarly provided the best performances. Especially in this study, besides the its high performance, the group-based feature selection method RCE-IFE grouped enzymes into clusters unlike TFS, and then identified biologically relevant CRC-associated enzymes.
  • Conference Object
    Citation - Scopus: 2
    Effect of Recursive Cluster Elimination With Different Clustering Algorithms Applied to Gene Expression Data
    (Institute of Electrical and Electronics Engineers Inc., 2023-10-11) Kuzudisli, Cihan; Bakir-Güngör, Burcu; Qaqish, Bahjat F.; Yousef, Malik
    Feature selection (FS) is an effective tool in dealing with high dimensionality and reducing computational cost. Support Vector Machines-Recursive Cluster Elimination (SVM-RCE) is one of several algorithms that have been developed for FS in high dimensional data. SVM-RCE involves a clustering step which originally is k-means. Using various performance metrics, three alternative algorithms are evaluated in this context; k-medoids, Hierarchical Clustering (HC), and Gaussian Mixture Model (GMM). Comparisons will be carried out on five publicly available gene expression datasets. The results show that k-means in SVM-RCE obtains higher performance than other tested algorithms in terms of classification performance. Additionally, HC shows a similar performance to k-means. Our findings show superiority of using k-means. This study can contribute to the development of SVM-RCE with different variations, leading to decrease in the number of selected genes, and an increase in prediction performance. © 2023 Elsevier B.V., All rights reserved.
  • Conference Object
    Colorectal Cancer Prediction via Applying Recursive Cluster Elimination With Intra-Cluster Feature Elimination on Metagenomic Pathway Data
    (Springer International Publishing AG, 2024) Temiz, Mustafa; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu
    Advances in next-generation sequencing and in "-omics" technologies enable the characterization of the human gut microbiome. Colorectal cancer (CRC), the third most common cancer worldwide, is caused by genetic mutations, environmental influences, and abnormalities in the gut microbiota. The aim of this study is to identify pathways that influence host metabolism in CRC patients. The CRC-related metagenomic dataset used in this study contains the relative abundance values of 551 pathways calculated for 1262 samples. Here, two different approaches based on the feature grouping reduce the number of features by considering relevant features as groups, eliminate irrelevant features, and perform classification. The recursive cluster elimination with intra-cluster feature elimination (RCE-IFE) approach achieves anAUCof 0.72 using an average of 66.2 features on CRC-associated metagenomics dataset. In these experiments, P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: S-adenosyl-L-methionine cycle I pathways are identified as potential biomarkers associated with CRC. These experiments also reduce the number of features reported by both approaches in P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: Sadenosyl-L-methionine cycle I pathways reported by both approaches are considered possible CRC-related biomarkers. This study contributes to the molecular diagnosis and treatment of colorectal cancer by revealing the pathways associated with CRC. Our results are promising for the study of the gut microbiota and its role in CRC.
  • Conference Object
    A Comparative Study on Psychiatric Disorders: Identification of Shared Pathways and Common Agents
    (Institute of Electrical and Electronics Engineers Inc., 2022-09-07) Kuzudisli, Cihan; Bakir-Güngör, Burcu; Bakir Gungor, Burcu
    Distinct but closely related diseases generally present shared symptoms, which address possible overlaps among their pathogenic mechanisms. Identification of significantly impacted shared pathways and other common agents are expected to elucidate etiology of these disorders and to help design better intervention strategies. In this research effort, we studied six psychiatric disorders including schizophrenia (SCZ), anorexia (AN), bipolar disorder (BD), depressive disorder (DD), autism (AU) and attention deficit hyperactivity disorder (ADHD). Our methodology can be classified into the following two parts: In Part I, common susceptibility genes; and in Part II, genome-wide association studies (GWAS) data were used to find enriched pathways of psychiatric disorders. 59 KEGG pathways were commonly identified in both parts. 31 of these pathways are disease pathways. Pathways related to cancer and infectious diseases were predominant compared to others. Most of the acquired pathways were in accordance with previous studies in literature. A combination of susceptibility genes and GWAS data is an effective approach to identify significantly impacted pathways in multifactorial diseases. In this respect, shared modules were determined after applying hierarchical clustering of the enriched pathways. These identified modules may tell us the association of psychiatric disorders with the enriched pathways. Taken all together, common pathways and shared modules are expected to highlight the causative factors and important mechanisms behind complex psychiatric diseases, leading to effective drug discovery. © 2022 Elsevier B.V., All rights reserved.