WoS İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/394
Browse
12 results
Search Results
Article Citation - WoS: 6Citation - Scopus: 7The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease(IEEE Computer Soc, 2022-05-01) Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, BurcuBehcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.Article Citation - WoS: 15Citation - Scopus: 15PriPath: Identifying Dysregulated Pathways From Differential Gene Expression via Grouping, Scoring, and Modeling With an Embedded Feature Selection Approach(BMC, 2023-02-23) Yousef, Malik; Ozdemir, Fatma; Jaber, Amhar; Allmer, Jens; Bakir-Gungor, BurcuBackgroundCell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases.ResultsPriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research.ConclusionsPriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.Article Citation - WoS: 24Citation - Scopus: 27PANOGA: a Web Server for Identification of SNP-Targeted Pathways From Genome-Wide Association Study Data(Oxford Univ Press, 2014-01-11) Bakir-Gungor, Burcu; Egemen, Ece; Sezerman, Osman UgurGenome-wide association studies (GWAS) have revolutionized the search for the variants underlying human complex diseases. However, in a typical GWAS, only a minority of the single-nucleotide polymorphisms (SNPs) with the strongest evidence of association is explained. One possible reason of complex diseases is the alterations in the activity of several biological pathways. Here we present a web server called Pathway and Network-Oriented GWAS Analysis to devise functionally important pathways through the identification of SNP-targeted genes within these pathways. The strength of our methodology stems from its multidimensional perspective, where we combine evidence from the following five resources: (i) genetic association information obtained through GWAS, (ii) SNP functional information, (iii) protein-protein interaction network, (iv) linkage disequilibrium and (v) biochemical pathways.Article Citation - WoS: 5Citation - Scopus: 5Novel Antimicrobial Peptide Design Using Motif Match Score Representation(IEEE Computer Soc, 2024-11) Soylemez, Ummu Gulsum; Yousef, Malik; Kesmen, Zulal; Bakir-Gungor, BurcuAntimicrobial peptides (AMPs) have drawn the interest of the researchers since they offer an alternative to the traditional antibiotics in the fight against antibiotic resistance and they exhibit additional pharmaceutically significant properties. Recently, computational approaches attemp to reveal how antibacterial activity is determined from a machine learning perspective and they aim to search and find the biological cues or characteristics that control antimicrobial activity via incorporating motif match scores. This study is dedicated to the development of a machine learning framework aimed at devising novel antimicrobial peptide (AMP) sequences potentially effective against Gram-positive/Gram-negative bacteria. In order to design newly generated sequences classified as either AMP or non-AMP, various classification models were trained. These novel sequences underwent validation utilizing the "DBAASP: strain-specific antibacterial prediction based on machine learning approaches and data on AMP sequences" tool. The findings presented herein represent a significant stride in this computational research, streamlining the process of AMP creation or modification within wet lab environments.Article Citation - WoS: 9Citation - Scopus: 15MicroBiomeGSM: The Identification of Taxonomic Biomarkers From Metagenomic Data Using Grouping, Scoring and Modeling (G-S-M) Approach(Frontiers Media S.A., 2023-11-22) Bakir-Gungor, Burcu; Temiz, Mustafa; Jabeer, Amhar; Wu, Di; Yousef, MalikNumerous biological environments have been characterized with the advent of metagenomic sequencing using next generation sequencing which lays out the relative abundance values of microbial taxa. Modeling the human microbiome using machine learning models has the potential to identify microbial biomarkers and aid in the diagnosis of a variety of diseases such as inflammatory bowel disease, diabetes, colorectal cancer, and many others. The goal of this study is to develop an effective classification model for the analysis of metagenomic datasets associated with different diseases. In this way, we aim to identify taxonomic biomarkers associated with these diseases and facilitate disease diagnosis. The microBiomeGSM tool presented in this work incorporates the pre-existing taxonomy information into a machine learning approach and challenges to solve the classification problem in metagenomics disease-associated datasets. Based on the G-S-M (Grouping-Scoring-Modeling) approach, species level information is used as features and classified by relating their taxonomic features at different levels, including genus, family, and order. Using four different disease associated metagenomics datasets, the performance of microBiomeGSM is comparatively evaluated with other feature selection methods such as Fast Correlation Based Filter (FCBF), Select K Best (SKB), Extreme Gradient Boosting (XGB), Conditional Mutual Information Maximization (CMIM), Maximum Likelihood and Minimum Redundancy (MRMR) and Information Gain (IG), also with other classifiers such as AdaBoost, Decision Tree, LogitBoost and Random Forest. microBiomeGSM achieved the highest results with an Area under the curve (AUC) value of 0.98% at the order taxonomic level for IBDMD dataset. Another significant output of microBiomeGSM is the list of taxonomic groups that are identified as important for the disease under study and the names of the species within these groups. The association between the detected species and the disease under investigation is confirmed by previous studies in the literature. The microBiomeGSM tool and other supplementary files are publicly available at: https://github.com/malikyousef/microBiomeGSM.Article Citation - WoS: 5Citation - Scopus: 4Investigating Strain Rate Effects on Damage Mechanisms in Hybrid Laminated Composites Using Acoustic Emission(Elsevier Sci Ltd, 2025-12) Gulsen, Abdulkadir; Kolukisa, Burak; Etcil, Mustafa; Caliskan, Umut; Zafar, Hafiz Muhammad Numan; Demirbas, Munise Didem; Bakir-Gungor, BurcuHybrid composites, which combine distinct fiber types such as carbon, basalt, and aramid, provide a synergistic balance of strength, stiffness, impact resistance, and energy dissipation, making them appealing for critical applications in aerospace, automotive, and other high-performance industries. Monitoring damage progression in these composites is vital for ensuring structural integrity and preventing catastrophic failures. Acoustic emission (AE) serves as a powerful, noninvasive technique for real-time structural health monitoring, capturing the transient stress waves generated when damage events occur. This study utilizes AE to examine the influence of strain rate on damage modes in carbon/basalt/aramid hybrid composites under three-point bending. An unsupervised feature selection based on Laplacian scores is employed to identify the most relevant AE features with damage modes, while SHapley Additive Explanations (SHAP) are used to evaluate the correlation between AE features and strain rates. The correlation analysis results indicate that peak frequency (PF) serves as a key indicator, demonstrating significant shifts at higher strain rates. Gaussian Mixture Model (GMM) clustering is used to analyze hybrid composites by examining clustered AE signals based on selected features identified through Laplacian scores, with Silhouette scores employed to determine the optimal number of clusters. This study highlights the role of AE in understanding fiber interactions and damage evolution, offering valuable insights into the mechanical performance and optimization of carbon/basalt/aramid hybrid composite structures.Article Citation - WoS: 25Citation - Scopus: 28Identification of Possible Pathogenic Pathways in Behcet's Disease Using Genome-Wide Association Study Data From Two Different Populations(Nature Publishing Group, 2014-09-17) Bakir-Gungor, Burcu; Remmers, Elaine F.; Meguro, Akira; Mizuki, Nobuhisa; Kastner, Daniel L.; Gul, Ahmet; Sezerman, Osman U.Behcet's disease (BD) is a multi-system inflammatory disorder of unknown etiology. Two recent genome-wide association studies (GWASs) of BD confirmed a strong association with the MHC class I region and identified two non-HLA common genetic variations. In complex diseases, multiple factors may target different sets of genes in the same pathway and thus may cause the same disease phenotype. We therefore hypothesized that identification of disease-associated pathways is critical to elucidate mechanisms underlying BD, and those pathways may be conserved within and across populations. To identify the disease-associated pathways, we developed a novel methodology that combines nominally significant evidence of genetic association with current knowledge of biochemical pathways, protein-protein interaction networks, and functional information of selected SNPs. Using this methodology, we searched for the disease-related pathways in two BD GWASs in Turkish and Japanese case-control groups. We found that 6 of the top 10 identified pathways in both populations were overlapping, even though there were few significantly conserved SNPs/genes within and between populations. The probability of random occurrence of such an event was 2.24E -39. These shared pathways were focal adhesion, MAPK signaling, TGF-beta signaling, ECM-receptor interaction, complement and coagulation cascades, and proteasome pathways. Even though each individual has a unique combination of factors involved in their disease development, the targeted pathways are expected to be mostly the same. Hence, the identification of shared pathways between the Turkish and the Japanese patients using GWAS data may help further elucidate the inflammatory mechanisms in BD pathogenesis.Article Citation - WoS: 42Citation - Scopus: 42HomSI: A Homozygous Stretch Identifier From Next-Generation Sequencing Data(Oxford Univ Press, 2013-12-03) Gormez, Zeliha; Bakir-Gungor, Burcu; Sagiroglu, Mahmut SamilIn consanguineous families, as a result of inheriting the same genomic segments through both parents, the individuals have stretches of their genomes that are homozygous. This situation leads to the prevalence of recessive diseases among the members of these families. Homozygosity mapping is based on this observation, and in consanguineous families, several recessive disease genes have been discovered with the help of this technique. The researchers typically use single nucleotide polymorphism arrays to determine the homozygous regions and then search for the disease gene by sequencing the genes within this candidate disease loci. Recently, the advent of next-generation sequencing enables the concurrent identification of homozygous regions and the detection of mutations relevant for diagnosis, using data from a single sequencing experiment. In this respect, we have developed a novel tool that identifies homozygous regions using deep sequence data. Using*.vcf (variant call format) files as an input file, our program identifies the majority of homozygous regions found by microarray single nucleotide polymorphism genotype data.Letter Epistatic Interactions Between Autoimmunity and Genetic Thrombophilia' Reply(Nature Publishing Group, 2015-01-14) Bakir-Gungor, Burcu; Remmers, Elaine F.; Meguro, Akira; Mizuki, Nobuhisa; Kastner, Daniel L.; Gul, Ahmet; Sezerman, Osman UgurArticle Citation - WoS: 25Citation - Scopus: 31Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods(Frontiers Media S.A., 2021-08-25) Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, Amhar; Nalbantoglu, O. Ufuk; Yousef, MalikHuman gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.</p>
