Browsing by Author "Kuzudisli, Cihan"
Now showing 1 - 9 of 9
- Results Per Page
- Sort Options
Conference Object Colorectal Cancer Prediction via Applying Recursive Cluster Elimination With Intra-Cluster Feature Elimination on Metagenomic Pathway Data(Springer International Publishing AG, 2024) Temiz, Mustafa; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiAdvances in next-generation sequencing and in "-omics" technologies enable the characterization of the human gut microbiome. Colorectal cancer (CRC), the third most common cancer worldwide, is caused by genetic mutations, environmental influences, and abnormalities in the gut microbiota. The aim of this study is to identify pathways that influence host metabolism in CRC patients. The CRC-related metagenomic dataset used in this study contains the relative abundance values of 551 pathways calculated for 1262 samples. Here, two different approaches based on the feature grouping reduce the number of features by considering relevant features as groups, eliminate irrelevant features, and perform classification. The recursive cluster elimination with intra-cluster feature elimination (RCE-IFE) approach achieves anAUCof 0.72 using an average of 66.2 features on CRC-associated metagenomics dataset. In these experiments, P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: S-adenosyl-L-methionine cycle I pathways are identified as potential biomarkers associated with CRC. These experiments also reduce the number of features reported by both approaches in P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: Sadenosyl-L-methionine cycle I pathways reported by both approaches are considered possible CRC-related biomarkers. This study contributes to the molecular diagnosis and treatment of colorectal cancer by revealing the pathways associated with CRC. Our results are promising for the study of the gut microbiota and its role in CRC.Conference Object A Comparative Study on Psychiatric Disorders: Identification of Shared Pathways and Common Agents(Institute of Electrical and Electronics Engineers Inc., 2022) Kuzudisli, Cihan; Bakir-Güngör, Burcu; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiDistinct but closely related diseases generally present shared symptoms, which address possible overlaps among their pathogenic mechanisms. Identification of significantly impacted shared pathways and other common agents are expected to elucidate etiology of these disorders and to help design better intervention strategies. In this research effort, we studied six psychiatric disorders including schizophrenia (SCZ), anorexia (AN), bipolar disorder (BD), depressive disorder (DD), autism (AU) and attention deficit hyperactivity disorder (ADHD). Our methodology can be classified into the following two parts: In Part I, common susceptibility genes; and in Part II, genome-wide association studies (GWAS) data were used to find enriched pathways of psychiatric disorders. 59 KEGG pathways were commonly identified in both parts. 31 of these pathways are disease pathways. Pathways related to cancer and infectious diseases were predominant compared to others. Most of the acquired pathways were in accordance with previous studies in literature. A combination of susceptibility genes and GWAS data is an effective approach to identify significantly impacted pathways in multifactorial diseases. In this respect, shared modules were determined after applying hierarchical clustering of the enriched pathways. These identified modules may tell us the association of psychiatric disorders with the enriched pathways. Taken all together, common pathways and shared modules are expected to highlight the causative factors and important mechanisms behind complex psychiatric diseases, leading to effective drug discovery. © 2022 Elsevier B.V., All rights reserved.Conference Object Citation - Scopus: 2Effect of Recursive Cluster Elimination With Different Clustering Algorithms Applied to Gene Expression Data(Institute of Electrical and Electronics Engineers Inc., 2023) Kuzudisli, Cihan; Bakir-Güngör, Burcu; Qaqish, Bahjat F.; Yousef, Malik; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiFeature selection (FS) is an effective tool in dealing with high dimensionality and reducing computational cost. Support Vector Machines-Recursive Cluster Elimination (SVM-RCE) is one of several algorithms that have been developed for FS in high dimensional data. SVM-RCE involves a clustering step which originally is k-means. Using various performance metrics, three alternative algorithms are evaluated in this context; k-medoids, Hierarchical Clustering (HC), and Gaussian Mixture Model (GMM). Comparisons will be carried out on five publicly available gene expression datasets. The results show that k-means in SVM-RCE obtains higher performance than other tested algorithms in terms of classification performance. Additionally, HC shows a similar performance to k-means. Our findings show superiority of using k-means. This study can contribute to the development of SVM-RCE with different variations, leading to decrease in the number of selected genes, and an increase in prediction performance. © 2023 Elsevier B.V., All rights reserved.Conference Object Identification of Shared Pathways Among Immune Related Diseases Utilizing Active Subnetworks(IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2020) Eryilmaz, Mahmut Kaan; Kuzudisli, Cihan; Gungor, Burcu Bakir; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiDifferent, but related diseases often contain shared symptoms indicating the presence of possible overlaps in underlying pathogenic mechanisms. The identification of the shared pathways and related factors across these diseases helps to better understand the causes of these diseases, to prevent and treat these diseases. In this study, using immune-related diseases, we proposed a new method on how to compare the development mechanisms of related diseases based on biological pathways. Following the developments in genomic technologies, the genotyping gets cheaper and easier, and hence genome-wide association studies (GWAS) emerged. By this means, via studying big-sized case-control groups for a specific disease, potential genetic variations, single nucleotide polymorphisms (SNPs) could he identified. With the help of these studies, in which around a million of SNPs are scanned, the variations and genes that could have a role in specific disease development could be detected. In this study, via using available GWAS datasets and human protein-protein interaction network, and via detecting active subnetworks and affected pathways, seven immune related diseases are analyzed. Via investigating the similarities among the identified pathways for related diseases, we aim to define the underlying pathogenic mechanisms, and hence to contribute to the elucidation of disease development mechanisms and to the drug repositioning studies.Conference Object İmmün Bağlantılı Hastalıklarda Aktif Alt Ağ Araması ile Ortak Hastalık Oluşum Mekanizmalarının Tespiti(IEEE, 2020) Eryilmaz, Mahmut Kaan; Kuzudisli, Cihan; Gungor, Burcu Bakir; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiDifferent, but related diseases often contain shared symptoms indicating the presence of possible overlaps in underlying pathogenic mechanisms. The identification of the shared pathways and related factors across these diseases helps to better understand the causes of these diseases, to prevent and treat these diseases. In this study, using immune-related diseases, we proposed a new method on how to compare the development mechanisms of related diseases based on biological pathways. Following the developments in genomic technologies, the genotyping gets cheaper and easier, and hence genome-wide association studies (GWAS) emerged. By this means, via studying big-sized case-control groups for a specific disease, potential genetic variations, single nucleotide polymorphisms (SNPs) could he identified. With the help of these studies, in which around a million of SNPs are scanned, the variations and genes that could have a role in specific disease development could be detected. In this study, via using available GWAS datasets and human protein-protein interaction network, and via detecting active subnetworks and affected pathways, seven immune related diseases are analyzed. Via investigating the similarities among the identified pathways for related diseases, we aim to define the underlying pathogenic mechanisms, and hence to contribute to the elucidation of disease development mechanisms and to the drug repositioning studies.Conference Object Metagenomic Data Analysis With Machine Learning to Discover Colorectal Cancer-Associated Enzymes(IEEE, 2024) Ersoz, Nur Sebnem; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiThe human gut microbiome comprises over 10 trillion microbes and plays important roles in maintaining metabolism, body homeostasis, impacting immune function. Metagenomics which studies genomic data from clinical and environmental samples is crucial in understanding the interplay between the host and the gut microbiome. Recently, functional profiling of metagenomes helps to identify alterations in microbial functions, particularly enzyme-encoding genes. Colorectal cancer (CRC) is known as one of the leading causes of cancer-related deaths. In this study, we aimed to find the CRC-associated enzymes by analyzing metagenomic data with different machine learning methods. A total of 1262 samples including CRC and control groups from different countries were used in this study. This dataset was obtained by functionally profiling metagenomics data and estimating community level enzyme commission (EC) abundance values. For the analysis of this dataset, RCE-IFE and SVM-RCE machine learning methods, which are group-based feature selection methods, were compared with 6 different individual feature selection methods. 10 times Monte-Carlo Cross Validation was used in our experiments. It was observed that RCE-IFE, Extreme Gradient Boosting and Select K Best methods similarly provided the best performances. Especially in this study, besides the its high performance, the group-based feature selection method RCE-IFE grouped enzymes into clusters unlike TFS, and then identified biologically relevant CRC-associated enzymes.Conference Object Citation - WoS: 1Citation - Scopus: 1Prediction of Type 2 Diabetes Using Metagenomic Data and Identification of Taxonomic Biomarkers(IEEE, 2024) Temiz, Mustafa; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiNowadays, different molecular levels of -omics data on diseases are generated and analyzing these data with machine learning methods is one of the popular research topics. Among these data, the use of metagenomic data to facilitate the diagnosis, detection and treatment of diseases is increasing day by day. Type 2 diabetes (T2D) is a chronic disease characterized by insulin resistance and progressive dysfunction of pancreatic beta cells. While the number of people with diabetes is increasing by around 8% annually, the cost of treating the disease is rising by 18% per year. Therefore, the number of studies on the diagnosis, development and progression of T2D is increasing over time. The aim of this study is to achieve higher machine learning performance by using fewer metagenomic features and to achieve better classification performance by reducing computational costs. In this study, we compare the performance of three different methods using T2D-related metagenomic data. First, the MetaPhlAn tool is used to calculate the taxonomic species and their relative abundances in each sample. The SVM-RCE, RCE-IFE and microBiomeGSM tools used in this study are methods that perform classification by grouping and scoring features and are known to work well on complex datasets. In this study, the best results were obtained with the RCE-IFE tool with an AUC of 0.72 with an average of 125 features information. In addition, key taxonomic species identified by these tools as associated with T2D are presented in comparison to the literature.Article Citation - WoS: 1Citation - Scopus: 2RCE-IFE: Recursive Cluster Elimination With Intra-Cluster Feature Elimination(PeerJ Inc, 2025) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Qaqish, Bahjat; Yousef, Malik; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiThe computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniques that seek to detect strong correlations among features and identify irrelevant features. In this work, we propose the Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE) method that utilizes feature grouping and iterates grouping and elimination steps in a supervised context. We assess dimensionality reduction and discriminatory capabilities of RCE-IFE on various high-dimensional datasets from different biological domains. For a set of gene expression, MicroRNA (miRNA) expression, and methylation datasets, the performance of RCE-IFE is comparatively evaluated with RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE. On average, RCE-IFE attains an area under the curve (AUC) of 0.85 among tested expression datasets with the fewest features and the shortest running time, while RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE achieve similar AUCs of 0.84 and 0.83, respectively. RCE-IFE and SVM-RCE yield AUCs of 0.79 and 0.68, respectively when averaged over seven different metagenomics datasets, with RCE-IFE significantly reducing feature subsets. Furthermore, RCE-IFE surpasses several state-of-the-art FS methods, such as Minimum Redundancy Maximum Relevance (MRMR), Fast Correlation-Based Filter (FCBF), Information Gain (IG), Conditional Mutual Information Maximization (CMIM), SelectKBest (SKB), and eXtreme Gradient Boosting (XGBoost), obtaining an average AUC of 0.76 on five gene expression datasets. Compared with a similar tool, Multi-stage, RCE-IFE gives a similar average accuracy rate of 89.27% using fewer features on four cancer-related datasets. The comparability of RCE-IFE is also verified with other biological domain knowledge-based Grouping-Scoring-Modeling (G-S-M) tools, including mirGediNET, 3Mint, and miRcorrNet. Additionally, the biological relevance of the selected features by RCE-IFE is evaluated. The proposed method also exhibits high consistency in terms of the selected features across multiple runs. Our experimental findings imply that RCE-IFE provides robust classifier performance and significantly reduces feature size while maintaining feature relevance and consistency.Article Citation - WoS: 38Citation - Scopus: 51Review of Feature Selection Approaches Based on Grouping of Features(PeerJ Inc, 2023) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Bulut, Nurten; Qaqish, Bahjat; Yousef, Malik; 01. Abdullah Gül University; 02. 04. Bilgisayar Mühendisliği; 02. Mühendislik FakültesiWith the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly -ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
