Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 26

Citation - Scopus: 1
Feature Selection for Protein Dihedral Angle Prediction
(Institute of Electrical and Electronics Engineers Inc., 2017-09) Aydin, Zafer; Kaynar, Oǧuz; Görmez, Yasin
Impact of Gene Duplicate Handling Strategies on Classification Performance and Feature Selection in Gene Expression Data
(Institute of Electrical and Electronics Engineers Inc., 2025-09-17) Kuzudisli, Cihan; Qaqish, Bahjat; Gungor, Burcu Bakir; Yousef, Malik
Enhancing Complex Disease Group Scoring with Mirgedinet: A Multi-Algorithm Machine Learning Framework Based on the GSM Approach
(IEEE, 2025-06-25) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik
Integrating biological prior knowledge for disease gene associations has shown significant promise in discovering new biomarkers with potential translational applications. This work investigates the application of a multi-algorithm machine learning framework based on the Grouping-Scoring-Modeling (G-S-M) approach for improving the prediction of complex diseases. The study identifies the primary gene and miRNA interactions in various complex diseases with the help of miRGediNET, which is a machine-learning based tool that integrates data from three biological databases. Traditional methods have only focused on independence between features; the G-S-M method focuses on aggregating genes based on biological interactions, pinpointing the scoring of gene groups for a disease, and modeling its predictive capability using advanced machine learning algorithms. In this research paper, seven algorithms, including Support Vector Machine, Decision Tree, and CatBoost, were applied to eight datasets extracted from the GEO database. This framework proved very robust in ranking gene clusters, thus predicting critical biomarkers while doing 100-fold randomized cross-validation within the evaluation. The results indicate this approach's high potential for refining disease and supporting research for choosing the best algorithm that can provide biological insights and computational advances.
Citation - Scopus: 1
The Identification of Discriminative Single Nucleotide Polymorphism Sets for the Classification of Behçet's Disease
(Institute of Electrical and Electronics Engineers Inc., 2018-09) Görmez, Yasin; Işik, Yunus Emre; Bakir-Güngör, Burcu
Behçet's disease is a long-term multisystem inflammatory disorder, characterized by recurrent attacks affecting several organs. As the genotyping individuals get cheaper and easier following the developments in genomic technologies, genome-wide association studies (GWAS) emerged. By this means, via studying big-sized case-control groups for a specific disease, potential genetic variations, single nucleotide polymorphisms (SNPs) are identified. Although several genetic risk factors are identified for Behçet's disease with the help of these studies via scanning around a million of SNPs, these variations could only explain up to 20% of the disease's genetic risk. In this study, for Behçet's disease classification, via comparing all the SNPs genotyped in GWAS, with the SNPs selected via using genetic knowledge, gain ratio and information gain; both reduction in the feature size and improvement in the classification accuracy is aimed. Also, using different classification algorithms such as random forest, k-nearest neighbour and logistic regression, their effects on the classification accuracy are investigated. Our results showed that compared to other feature selection methods, with at least 81% success rate, the selection of the SNPs using the genetic information (of their GWAS p-values, indicating the significance of the SNP against the disease) provides 15% to 42% improvement in all classification algorithms. This improvement is statistically sound. While gain ratio and information gain feature selection techniques yield similar classification accuracies, the models using all SNPs could not exceed 50% accuracies and results in the worst performance. © 2019 Elsevier B.V., All rights reserved.
The Effect of Different Classifiers on Recursive Cluster Elimination in the Analysis of Transcriptomic Data
(Institute of Electrical and Electronics Engineers Inc., 2023-10-11) Bulut, Nurten; Bakir-Güngör, Burcu; Qaqish, Bahjat F.; Yousef, Malik
Gene expression data with limited sample size and a large number of genes are frequently encountered in genetic studies. In such high-dimensional data, identification of genes that distinguish between disease states is a challenging task. Feature selection (FS) is a useful approach in dealing with high dimensionality. Support Vector Machines Recursive Cluster Elimination (SVM-RCE) is a technique for FS in high-dimensional data. The SVM-RCE approach has been utilized for identification of clusters of genes whose expression levels correlate with pathological state. A key step in SVM-RCE is the use of an SVM classifier to assign an area under the curve (AUC) score to each gene cluster based on its ability to predict class labels. In this study, we investigate the use of alternative classifiers in the cluster-scoring step. Specifically, we compare Support Vector Machines, Random Forest, XgBoost, Naive Bayes, and linear logistic regression. In addition to AUC score performance evaluation, the algorithms are compared in terms of the number of selected genes at different levels of clustering and in terms of the running time. © 2023 Elsevier B.V., All rights reserved.
Citation - WoS: 7
Citation - Scopus: 8
The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease
(IEEE Computer Soc, 2022-05-01) Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu
Behcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.
TextNetTopics+: Enhancing Text Classification Through Classifier Diversity and Model Ensembling
(Springer International Publishing AG, 2025) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
TextNetTopics is an innovative text classification framework that integrates topic modeling with feature selection to improve model accuracy and interpretability. Unlike traditional methods that rely on individual words, TextNetTopics selects cohesive topics extracted via Latent Dirichlet Allocation as features for document representation, effectively reducing dimensionality while preserving the semantic structure of the text. This study evaluates the performance of TextNetTopics utilizing multiple machine learning algorithms in the M (Modeling) component, including Random Forest, Support Vector Machine, Gradient Boosting, eXtreme Gradient Boosting, and Logistic Regression. To further enhance classification performance, we introduce TextNetTopics+, an ensemblebased extension that leverages both hard voting and soft voting mechanisms to combine the strengths of multiple classifiers. Comprehensive experiments on the LitCovid and WOS datasets demonstrate that ensemble learning in TextNetTopics + significantly outperforms individual classifiers in TextNetTopics, confirming its effectiveness in improving model robustness and generalization.
Citation - Scopus: 3
Prediction of Colorectal Cancer Based on Taxonomic Levels of Microorganisms and Discovery of Taxonomic Biomarkers Using the Grouping-Scoring (G-S-M) Approach
(Elsevier Ltd, 2025-03) Bakir-Güngör, Burcu; Temiz, Mustafa; Canakcimaksutoglu, Beyza; Yousef, Malik
Colorectal cancer (CRC) is one of the most prevalent forms of cancer globally. The human gut microbiome plays an important role in the development of CRC and serves as a biomarker for early detection and treatment. This research effort focuses on the identification of potential taxonomic biomarkers of CRC using a grouping-based feature selection method. Additionally, this study investigates the effect of incorporating biological domain knowledge into the feature selection process while identifying CRC-associated microorganisms. Conventional feature selection techniques often fail to leverage existing biological knowledge during metagenomic data analysis. To address this gap, we propose taxonomy-based Grouping Scoring Modeling (G-S-M) method that integrates biological domain knowledge into feature grouping and selection. In this study, using metagenomic data related to CRC, classification is performed at three taxonomic levels (genus, family and order). The MetaPhlAn tool is employed to determine the relative abundance values of species in each sample. Comparative performance analyses involve six feature selection methods and four classification algorithms. When experimented on two CRC associated metagenomics datasets, the highest performance metric, yielding an AUC of 0.90, is observed at the genus taxonomic level. At this level, 7 out of top 10 groups (Parvimonas, Peptostreptococcus, Fusobacterium, Gemella, Streptococcus, Porphyromonas and Solobacterium) were commonly identified for both datasets. Moreover, the identified microorganisms at genus, family, and order levels are thoroughly discussed via refering to CRC-related metagenomic literature. This study not only contributes to our understanding of CRC development, but also highlights the applicability of taxonomy-based G-S-M method in tackling various diseases. © 2025 Elsevier B.V., All rights reserved.
Metabolomics Data Analysis to Discover Chronic Granulomatous Disease-Associated Biomarkers Utilizing G-S-M Machine Learning Model via Grouping Metabolites According to Ion Type
(Institute of Electrical and Electronics Engineers Inc., 2024-10-16) Ersöz, Nur Sebnem; Bakir-Güngör, Burcu; Yousef, Malik
Chronic Granulomatous Disease (CGD) is a rare, inherited immunodeficiency disorder characterized by white blood cells unable to effectively kill certain bacteria and fungi. This defect results in the formation of clusters of immune cells called granulomas that form at sites of infection or inflammation. Therefore, identification of disease-related biomarkers is a critical step in advancing precision medicine and improving diagnostic accuracy. In this study, we applied a G-S-M machine learning approach to metabolomics data to uncover CGD-Associated biomarkers. We obtained a metabolomics dataset from Gene Expression Omnibus with GSE220260 accession number. Data includes 85 samples (16 healthy controls and 69 CGD samples) with comprehensive metabolic profiles obtained using liquid chromatography-mass spectrometry analysis. Dataset includes metabolite names with their ion type and formula. In order to identify CGD related metabolites and their ion types, G-S-M was used as a grouping function when performing machine learning oriented metabolomics data analysis. We have performed the G-S-M approach by grouping metabolites according to their ion type. In the training part of the G-S-M approach, metabolites annotated with selected ion types have been utilized to perform a two-class classification task which generates an important set of ion type output. We also compared the performance results of the G-S-M machine learning model with traditional feature selection methods; XGB, SKB, IG, FCBF, MRMR, CMIM with random forest classifier. 100 times Monte-Carlo Cross Validation was used in our experiments. It was observed that G-S-M, XGB, SKB and FCBF methods similarly provided the best performances. In this study, besides its performance, G-S-M method used groups based on ion types unlike TFS, and then identified relevant Chronic Granulomatous Disease-associated metabolites. © 2024 Elsevier B.V., All rights reserved.
Citation - WoS: 16
Citation - Scopus: 20
Machine Learning Analysis of Inflammatory Bowel Disease-Associated Metagenomics Dataset
(Institute of Electrical and Electronics Engineers Inc., 2018-09) Hacilar, Hilal; Nalbantoĝlu, Özkan Ufuk; Bakir-Güngör, Burcu
There is an ongoing interplay between humans and our microbial communities. The microorganisms living in our gut produce energy from our food, strengthen our immune system, break down foreign products, and release metabolites and hormones, which are significant for regulating our physiology. The shifts away from this 'healthy' gut microbiome is considered to be associated with many diseases. Inflammatory bowel diseases (IBD) including Crohn's disease and ulcerative colitis, are gut related disorders affecting the intestinal tract. Although some metagenomics studies are conducted on IBD recently, our current understanding of the precise relationships between the human gut microbiome and IBD remains limited. In this regard, the use of state-of-the art machine learning approaches became popular to address a variety of questions like early diagnosis of certain diseases using human microbiota. In this study, we investigate which subset of gut microbiota are mostly associated with IBD and if disease-associated biomarkers can be detected via applying state-of-the art machine learning algorithms and proper feature selection methods. © 2019 Elsevier B.V., All rights reserved.

Scopus İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results