PubMed İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/397
Browse
6 results
Search Results
Article Citation - WoS: 15Citation - Scopus: 15PriPath: Identifying Dysregulated Pathways From Differential Gene Expression via Grouping, Scoring, and Modeling With an Embedded Feature Selection Approach(BMC, 2023-02-23) Yousef, Malik; Ozdemir, Fatma; Jaber, Amhar; Allmer, Jens; Bakir-Gungor, BurcuBackgroundCell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases.ResultsPriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research.ConclusionsPriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.Article Citation - Scopus: 2Prediction of Colorectal Cancer Based on Taxonomic Levels of Microorganisms and Discovery of Taxonomic Biomarkers Using the Grouping-Scoring (G-S-M) Approach(Elsevier Ltd, 2025-03) Bakir-Güngör, Burcu; Temiz, Mustafa; Canakcimaksutoglu, Beyza; Yousef, MalikColorectal cancer (CRC) is one of the most prevalent forms of cancer globally. The human gut microbiome plays an important role in the development of CRC and serves as a biomarker for early detection and treatment. This research effort focuses on the identification of potential taxonomic biomarkers of CRC using a grouping-based feature selection method. Additionally, this study investigates the effect of incorporating biological domain knowledge into the feature selection process while identifying CRC-associated microorganisms. Conventional feature selection techniques often fail to leverage existing biological knowledge during metagenomic data analysis. To address this gap, we propose taxonomy-based Grouping Scoring Modeling (G-S-M) method that integrates biological domain knowledge into feature grouping and selection. In this study, using metagenomic data related to CRC, classification is performed at three taxonomic levels (genus, family and order). The MetaPhlAn tool is employed to determine the relative abundance values of species in each sample. Comparative performance analyses involve six feature selection methods and four classification algorithms. When experimented on two CRC associated metagenomics datasets, the highest performance metric, yielding an AUC of 0.90, is observed at the genus taxonomic level. At this level, 7 out of top 10 groups (Parvimonas, Peptostreptococcus, Fusobacterium, Gemella, Streptococcus, Porphyromonas and Solobacterium) were commonly identified for both datasets. Moreover, the identified microorganisms at genus, family, and order levels are thoroughly discussed via refering to CRC-related metagenomic literature. This study not only contributes to our understanding of CRC development, but also highlights the applicability of taxonomy-based G-S-M method in tackling various diseases. © 2025 Elsevier B.V., All rights reserved.Article Citation - WoS: 5Citation - Scopus: 5Novel Antimicrobial Peptide Design Using Motif Match Score Representation(IEEE Computer Soc, 2024-11) Soylemez, Ummu Gulsum; Yousef, Malik; Kesmen, Zulal; Bakir-Gungor, BurcuAntimicrobial peptides (AMPs) have drawn the interest of the researchers since they offer an alternative to the traditional antibiotics in the fight against antibiotic resistance and they exhibit additional pharmaceutically significant properties. Recently, computational approaches attemp to reveal how antibacterial activity is determined from a machine learning perspective and they aim to search and find the biological cues or characteristics that control antimicrobial activity via incorporating motif match scores. This study is dedicated to the development of a machine learning framework aimed at devising novel antimicrobial peptide (AMP) sequences potentially effective against Gram-positive/Gram-negative bacteria. In order to design newly generated sequences classified as either AMP or non-AMP, various classification models were trained. These novel sequences underwent validation utilizing the "DBAASP: strain-specific antibacterial prediction based on machine learning approaches and data on AMP sequences" tool. The findings presented herein represent a significant stride in this computational research, streamlining the process of AMP creation or modification within wet lab environments.Article Citation - WoS: 9Citation - Scopus: 15MicroBiomeGSM: The Identification of Taxonomic Biomarkers From Metagenomic Data Using Grouping, Scoring and Modeling (G-S-M) Approach(Frontiers Media S.A., 2023-11-22) Bakir-Gungor, Burcu; Temiz, Mustafa; Jabeer, Amhar; Wu, Di; Yousef, MalikNumerous biological environments have been characterized with the advent of metagenomic sequencing using next generation sequencing which lays out the relative abundance values of microbial taxa. Modeling the human microbiome using machine learning models has the potential to identify microbial biomarkers and aid in the diagnosis of a variety of diseases such as inflammatory bowel disease, diabetes, colorectal cancer, and many others. The goal of this study is to develop an effective classification model for the analysis of metagenomic datasets associated with different diseases. In this way, we aim to identify taxonomic biomarkers associated with these diseases and facilitate disease diagnosis. The microBiomeGSM tool presented in this work incorporates the pre-existing taxonomy information into a machine learning approach and challenges to solve the classification problem in metagenomics disease-associated datasets. Based on the G-S-M (Grouping-Scoring-Modeling) approach, species level information is used as features and classified by relating their taxonomic features at different levels, including genus, family, and order. Using four different disease associated metagenomics datasets, the performance of microBiomeGSM is comparatively evaluated with other feature selection methods such as Fast Correlation Based Filter (FCBF), Select K Best (SKB), Extreme Gradient Boosting (XGB), Conditional Mutual Information Maximization (CMIM), Maximum Likelihood and Minimum Redundancy (MRMR) and Information Gain (IG), also with other classifiers such as AdaBoost, Decision Tree, LogitBoost and Random Forest. microBiomeGSM achieved the highest results with an Area under the curve (AUC) value of 0.98% at the order taxonomic level for IBDMD dataset. Another significant output of microBiomeGSM is the list of taxonomic groups that are identified as important for the disease under study and the names of the species within these groups. The association between the detected species and the disease under investigation is confirmed by previous studies in the literature. The microBiomeGSM tool and other supplementary files are publicly available at: https://github.com/malikyousef/microBiomeGSM.Article Citation - WoS: 25Citation - Scopus: 31Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods(Frontiers Media S.A., 2021-08-25) Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, Amhar; Nalbantoglu, O. Ufuk; Yousef, MalikHuman gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.</p>Article Citation - Scopus: 4CCPred: Global and Population-Specific Colorectal Cancer Prediction and Metagenomic Biomarker Identification at Different Molecular Levels Using Machine Learning Techniques(Elsevier Ltd, 2024-11) Bakir-Güngör, Burcu; Temiz, Mustafa; Inal, Yasin; Cicekyurt, Emre; Yousef, MalikColorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2′ 3′ cyclic 3′ phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED. © 2024 Elsevier B.V., All rights reserved.
