Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 14
  • Article
    Citation - WoS: 2
    Citation - Scopus: 4
    RCE-IFE: Recursive Cluster Elimination With Intra-Cluster Feature Elimination
    (PeerJ Inc, 2025-02-07) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Qaqish, Bahjat; Yousef, Malik
    The computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniques that seek to detect strong correlations among features and identify irrelevant features. In this work, we propose the Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE) method that utilizes feature grouping and iterates grouping and elimination steps in a supervised context. We assess dimensionality reduction and discriminatory capabilities of RCE-IFE on various high-dimensional datasets from different biological domains. For a set of gene expression, MicroRNA (miRNA) expression, and methylation datasets, the performance of RCE-IFE is comparatively evaluated with RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE. On average, RCE-IFE attains an area under the curve (AUC) of 0.85 among tested expression datasets with the fewest features and the shortest running time, while RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE achieve similar AUCs of 0.84 and 0.83, respectively. RCE-IFE and SVM-RCE yield AUCs of 0.79 and 0.68, respectively when averaged over seven different metagenomics datasets, with RCE-IFE significantly reducing feature subsets. Furthermore, RCE-IFE surpasses several state-of-the-art FS methods, such as Minimum Redundancy Maximum Relevance (MRMR), Fast Correlation-Based Filter (FCBF), Information Gain (IG), Conditional Mutual Information Maximization (CMIM), SelectKBest (SKB), and eXtreme Gradient Boosting (XGBoost), obtaining an average AUC of 0.76 on five gene expression datasets. Compared with a similar tool, Multi-stage, RCE-IFE gives a similar average accuracy rate of 89.27% using fewer features on four cancer-related datasets. The comparability of RCE-IFE is also verified with other biological domain knowledge-based Grouping-Scoring-Modeling (G-S-M) tools, including mirGediNET, 3Mint, and miRcorrNet. Additionally, the biological relevance of the selected features by RCE-IFE is evaluated. The proposed method also exhibits high consistency in terms of the selected features across multiple runs. Our experimental findings imply that RCE-IFE provides robust classifier performance and significantly reduces feature size while maintaining feature relevance and consistency.
  • Article
    Citation - WoS: 24
    Citation - Scopus: 27
    PANOGA: a Web Server for Identification of SNP-Targeted Pathways From Genome-Wide Association Study Data
    (Oxford Univ Press, 2014-01-11) Bakir-Gungor, Burcu; Egemen, Ece; Sezerman, Osman Ugur
    Genome-wide association studies (GWAS) have revolutionized the search for the variants underlying human complex diseases. However, in a typical GWAS, only a minority of the single-nucleotide polymorphisms (SNPs) with the strongest evidence of association is explained. One possible reason of complex diseases is the alterations in the activity of several biological pathways. Here we present a web server called Pathway and Network-Oriented GWAS Analysis to devise functionally important pathways through the identification of SNP-targeted genes within these pathways. The strength of our methodology stems from its multidimensional perspective, where we combine evidence from the following five resources: (i) genetic association information obtained through GWAS, (ii) SNP functional information, (iii) protein-protein interaction network, (iv) linkage disequilibrium and (v) biochemical pathways.
  • Article
    Citation - WoS: 4
    Citation - Scopus: 7
    Network Anomaly Detection Using Deep Autoencoder and Parallel Artificial Bee Colony Algorithm-Trained Neural Network
    (PeerJ Inc, 2024-10-08) Hacilar, Hilal; Dedeturk, Bilge Kagan; Bakir-Gungor, Burcu; Gungor, Vehbi Cagri
    Cyberattacks are increasingly becoming more complex, which makes intrusion detection extremely difficult. Several intrusion detection approaches have been developed in the literature and utilized to tackle computer security intrusions. Implementing machine learning and deep learning models for network intrusion detection has been a topic of active research in cybersecurity. In this study, artificial neural networks (ANNs), a type of machine learning algorithm, are employed to determine optimal network weight sets during the training phase. Conventional training algorithms, such as back- propagation, may encounter challenges in optimization due to being entrapped within local minima during the iterative optimization process; global search strategies can be slow at locating global minima, and they may suffer from a low detection rate. In the ANN training, the Artificial Bee Colony (ABC) algorithm enables the avoidance of local minimum solutions by conducting a high-performance search in the solution space but it needs some modifications. To address these challenges, this work suggests a Deep Autoencoder (DAE)-based, vectorized, and parallelized ABC algorithm for training feed-forward artificial neural networks, which is tested on the UNSW-NB15 and NF-UNSW-NB15-v2 datasets. Our experimental results demonstrate that the proposed DAE-based parallel ABC-ANN outperforms existing metaheuristics, showing notable improvements in network intrusion detection. The experimental results reveal a notable improvement in network intrusion detection through this proposed approach, exhibiting an increase in detection rate (DR) by 0.76 to 0.81 and a reduction in false alarm rate (FAR) by 0.016 to 0.005 compared to the ANN-BP algorithm on the UNSWNB15 dataset. Furthermore, there is a reduction in FAR by 0.006 to 0.0003 compared to the ANN-BP algorithm on the NF-UNSW-NB15-v2 dataset. These findings underscore the effectiveness of our proposed approach in enhancing network security against network intrusions.
  • Article
    Citation - WoS: 9
    Citation - Scopus: 15
    MicroBiomeGSM: The Identification of Taxonomic Biomarkers From Metagenomic Data Using Grouping, Scoring and Modeling (G-S-M) Approach
    (Frontiers Media S.A., 2023-11-22) Bakir-Gungor, Burcu; Temiz, Mustafa; Jabeer, Amhar; Wu, Di; Yousef, Malik
    Numerous biological environments have been characterized with the advent of metagenomic sequencing using next generation sequencing which lays out the relative abundance values of microbial taxa. Modeling the human microbiome using machine learning models has the potential to identify microbial biomarkers and aid in the diagnosis of a variety of diseases such as inflammatory bowel disease, diabetes, colorectal cancer, and many others. The goal of this study is to develop an effective classification model for the analysis of metagenomic datasets associated with different diseases. In this way, we aim to identify taxonomic biomarkers associated with these diseases and facilitate disease diagnosis. The microBiomeGSM tool presented in this work incorporates the pre-existing taxonomy information into a machine learning approach and challenges to solve the classification problem in metagenomics disease-associated datasets. Based on the G-S-M (Grouping-Scoring-Modeling) approach, species level information is used as features and classified by relating their taxonomic features at different levels, including genus, family, and order. Using four different disease associated metagenomics datasets, the performance of microBiomeGSM is comparatively evaluated with other feature selection methods such as Fast Correlation Based Filter (FCBF), Select K Best (SKB), Extreme Gradient Boosting (XGB), Conditional Mutual Information Maximization (CMIM), Maximum Likelihood and Minimum Redundancy (MRMR) and Information Gain (IG), also with other classifiers such as AdaBoost, Decision Tree, LogitBoost and Random Forest. microBiomeGSM achieved the highest results with an Area under the curve (AUC) value of 0.98% at the order taxonomic level for IBDMD dataset. Another significant output of microBiomeGSM is the list of taxonomic groups that are identified as important for the disease under study and the names of the species within these groups. The association between the detected species and the disease under investigation is confirmed by previous studies in the literature. The microBiomeGSM tool and other supplementary files are publicly available at: https://github.com/malikyousef/microBiomeGSM.
  • Article
    Citation - WoS: 5
    Citation - Scopus: 4
    Investigating Strain Rate Effects on Damage Mechanisms in Hybrid Laminated Composites Using Acoustic Emission
    (Elsevier Sci Ltd, 2025-12) Gulsen, Abdulkadir; Kolukisa, Burak; Etcil, Mustafa; Caliskan, Umut; Zafar, Hafiz Muhammad Numan; Demirbas, Munise Didem; Bakir-Gungor, Burcu
    Hybrid composites, which combine distinct fiber types such as carbon, basalt, and aramid, provide a synergistic balance of strength, stiffness, impact resistance, and energy dissipation, making them appealing for critical applications in aerospace, automotive, and other high-performance industries. Monitoring damage progression in these composites is vital for ensuring structural integrity and preventing catastrophic failures. Acoustic emission (AE) serves as a powerful, noninvasive technique for real-time structural health monitoring, capturing the transient stress waves generated when damage events occur. This study utilizes AE to examine the influence of strain rate on damage modes in carbon/basalt/aramid hybrid composites under three-point bending. An unsupervised feature selection based on Laplacian scores is employed to identify the most relevant AE features with damage modes, while SHapley Additive Explanations (SHAP) are used to evaluate the correlation between AE features and strain rates. The correlation analysis results indicate that peak frequency (PF) serves as a key indicator, demonstrating significant shifts at higher strain rates. Gaussian Mixture Model (GMM) clustering is used to analyze hybrid composites by examining clustered AE signals based on selected features identified through Laplacian scores, with Silhouette scores employed to determine the optimal number of clusters. This study highlights the role of AE in understanding fiber interactions and damage evolution, offering valuable insights into the mechanical performance and optimization of carbon/basalt/aramid hybrid composite structures.
  • Article
    Citation - WoS: 25
    Citation - Scopus: 28
    Identification of Possible Pathogenic Pathways in Behcet's Disease Using Genome-Wide Association Study Data From Two Different Populations
    (Nature Publishing Group, 2014-09-17) Bakir-Gungor, Burcu; Remmers, Elaine F.; Meguro, Akira; Mizuki, Nobuhisa; Kastner, Daniel L.; Gul, Ahmet; Sezerman, Osman U.
    Behcet's disease (BD) is a multi-system inflammatory disorder of unknown etiology. Two recent genome-wide association studies (GWASs) of BD confirmed a strong association with the MHC class I region and identified two non-HLA common genetic variations. In complex diseases, multiple factors may target different sets of genes in the same pathway and thus may cause the same disease phenotype. We therefore hypothesized that identification of disease-associated pathways is critical to elucidate mechanisms underlying BD, and those pathways may be conserved within and across populations. To identify the disease-associated pathways, we developed a novel methodology that combines nominally significant evidence of genetic association with current knowledge of biochemical pathways, protein-protein interaction networks, and functional information of selected SNPs. Using this methodology, we searched for the disease-related pathways in two BD GWASs in Turkish and Japanese case-control groups. We found that 6 of the top 10 identified pathways in both populations were overlapping, even though there were few significantly conserved SNPs/genes within and between populations. The probability of random occurrence of such an event was 2.24E -39. These shared pathways were focal adhesion, MAPK signaling, TGF-beta signaling, ECM-receptor interaction, complement and coagulation cascades, and proteasome pathways. Even though each individual has a unique combination of factors involved in their disease development, the targeted pathways are expected to be mostly the same. Hence, the identification of shared pathways between the Turkish and the Japanese patients using GWAS data may help further elucidate the inflammatory mechanisms in BD pathogenesis.
  • Article
    Citation - WoS: 42
    Citation - Scopus: 42
    HomSI: A Homozygous Stretch Identifier From Next-Generation Sequencing Data
    (Oxford Univ Press, 2013-12-03) Gormez, Zeliha; Bakir-Gungor, Burcu; Sagiroglu, Mahmut Samil
    In consanguineous families, as a result of inheriting the same genomic segments through both parents, the individuals have stretches of their genomes that are homozygous. This situation leads to the prevalence of recessive diseases among the members of these families. Homozygosity mapping is based on this observation, and in consanguineous families, several recessive disease genes have been discovered with the help of this technique. The researchers typically use single nucleotide polymorphism arrays to determine the homozygous regions and then search for the disease gene by sequencing the genes within this candidate disease loci. Recently, the advent of next-generation sequencing enables the concurrent identification of homozygous regions and the detection of mutations relevant for diagnosis, using data from a single sequencing experiment. In this respect, we have developed a novel tool that identifies homozygous regions using deep sequence data. Using*.vcf (variant call format) files as an input file, our program identifies the majority of homozygous regions found by microarray single nucleotide polymorphism genotype data.
  • Letter
    Epistatic Interactions Between Autoimmunity and Genetic Thrombophilia' Reply
    (Nature Publishing Group, 2015-01-14) Bakir-Gungor, Burcu; Remmers, Elaine F.; Meguro, Akira; Mizuki, Nobuhisa; Kastner, Daniel L.; Gul, Ahmet; Sezerman, Osman Ugur
  • Article
    Citation - WoS: 40
    Citation - Scopus: 65
    Ensemble Feature Selection and Classification Methods for Machine Learning-Based Coronary Artery Disease Diagnosis
    (Elsevier, 2023-03) Kolukisa, Burak; Bakir-Gungor, Burcu
    Coronary artery disease (CAD) is a condition in which the heart is not fed sufficiently as a result of the accumulation of fatty matter. As reported by the World Health Organization, around 32% of the total deaths in the world are caused by CAD, and it is estimated that approximately 23.6 million people will die from this disease in 2030. CAD develops over time, and the diagnosis of this disease is difficult until a blockage or a heart attack occurs. In order to bypass the side effects and high costs of the current methods, researchers have proposed to diagnose CADs with computer-aided systems, which analyze some physical and biochemical values at a lower cost. In this study, for the CAD diagnosis, (i) seven different computational feature selection (FS) methods, one domain knowledge-based FS method, and different classification algorithms have been evaluated; (ii) an exhaustive ensemble FS method and a probabilistic ensemble FS method have been proposed. The proposed approach is tested on three publicly available CAD data sets using six different classification algorithms and four different variants of voting algorithms. The performance metrics have been comparatively evaluated with numerous combinations of classifiers and FS methods. The multi-layer perceptron classifier obtained satisfactory results on three data sets. Performance evaluations show that the proposed approach resulted in 91.78%, 85.55%, and 85.47% accuracy for the Z-Alizadeh Sani, Statlog, and Cleveland data sets, respectively.
  • Article
    Citation - WoS: 25
    Citation - Scopus: 31
    Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
    (Frontiers Media S.A., 2021-08-25) Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, Amhar; Nalbantoglu, O. Ufuk; Yousef, Malik
    Human gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.</p>