Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 6 of 6
  • Conference Object
    Citation - Scopus: 2
    miRcorrNetPro: Unraveling Algorithmic Insights Through Cross-Validation in Multi-Omics Integration for Comprehensive Data Analysis
    (Institute of Electrical and Electronics Engineers Inc., 2023-12-05) Ünlü Yazici, Miray; Yousef, Malik; Marron, J. S.; Bakir-Güngör, Burcu; Yazici, Miray Unlu
    High throughput -omics technologies facilitate the investigation of regulatory mechanisms of complex diseases. Along this line, scientists develop promising tools and methods to extend our understanding at the molecular and functional levels. To this end, miRcorrNet tool performs integrative analysis of MicroRNA (miRNA) and gene expression profiles via machine learning (ML) approach to identify significant miRNA groups and their associated target genes. In this study, we propose miRcorrNetPro tool, which extends miRcorrNet by tracking group scoring, ranking and other information through the cross-validation iterations. Heatmap visualizations enable deep novel insights into the collective behavior of clusters of groups in cellular signaling and hence facilitate detection of potential biomarkers for the disease under investigation. Although miRcorrNetPro is designed as a generic tool, here we present our findings and potential miRNA biomarkers for Breast Cancer (BRCA). The miRcorrNetPro tool and all other supplementary files are available at https://github.com/Miray-Unlu/miRcorrNetPro. © 2024 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 1
    TextNetTopics_TIS: Enhancing Textnettopics With Random Forest-Based Topic Importance Scoring
    (Institute of Electrical and Electronics Engineers Inc., 2024-10-16) Voskergian, Daniel; Bakir-Güngör, Burcu; Yousef, Malik
    TextNetTopics is an innovative Latent Dirichlet Allocation-based topic selection method for training text classification models. One main limitation is its computationally intensive scoring mechanism, especially when applied to many topics. This scoring mechanism involves training a machine learning model (i.e., Random Forest) on each topic using the Monte-Carlo Cross-Validation approach and assigning a score value based on a specific performance metric (e.g., accuracy or F1-score). Moreover, the measured score does not account for the interactions between all features residing in all topics. This paper presents a new topic-scoring mechanism called Topic Importance Scoring. This computationally efficient approach trains a Random Forest model on all topics simultaneously and leverages the extracted feature importance values to give each topic a score reflecting its classification potential. The experiments on three diverse datasets confirm that the proposed method's performance is superior to the Topic Performance Scoring, which was used in the original TextNetTopics method. © 2024 Elsevier B.V., All rights reserved.
  • Article
    Citation - Scopus: 2
    Prediction of Colorectal Cancer Based on Taxonomic Levels of Microorganisms and Discovery of Taxonomic Biomarkers Using the Grouping-Scoring (G-S-M) Approach
    (Elsevier Ltd, 2025-03) Bakir-Güngör, Burcu; Temiz, Mustafa; Canakcimaksutoglu, Beyza; Yousef, Malik
    Colorectal cancer (CRC) is one of the most prevalent forms of cancer globally. The human gut microbiome plays an important role in the development of CRC and serves as a biomarker for early detection and treatment. This research effort focuses on the identification of potential taxonomic biomarkers of CRC using a grouping-based feature selection method. Additionally, this study investigates the effect of incorporating biological domain knowledge into the feature selection process while identifying CRC-associated microorganisms. Conventional feature selection techniques often fail to leverage existing biological knowledge during metagenomic data analysis. To address this gap, we propose taxonomy-based Grouping Scoring Modeling (G-S-M) method that integrates biological domain knowledge into feature grouping and selection. In this study, using metagenomic data related to CRC, classification is performed at three taxonomic levels (genus, family and order). The MetaPhlAn tool is employed to determine the relative abundance values of species in each sample. Comparative performance analyses involve six feature selection methods and four classification algorithms. When experimented on two CRC associated metagenomics datasets, the highest performance metric, yielding an AUC of 0.90, is observed at the genus taxonomic level. At this level, 7 out of top 10 groups (Parvimonas, Peptostreptococcus, Fusobacterium, Gemella, Streptococcus, Porphyromonas and Solobacterium) were commonly identified for both datasets. Moreover, the identified microorganisms at genus, family, and order levels are thoroughly discussed via refering to CRC-related metagenomic literature. This study not only contributes to our understanding of CRC development, but also highlights the applicability of taxonomy-based G-S-M method in tackling various diseases. © 2025 Elsevier B.V., All rights reserved.
  • Conference Object
    Metabolomics Data Analysis to Discover Chronic Granulomatous Disease-Associated Biomarkers Utilizing G-S-M Machine Learning Model via Grouping Metabolites According to Ion Type
    (Institute of Electrical and Electronics Engineers Inc., 2024-10-16) Ersöz, Nur Sebnem; Bakir-Güngör, Burcu; Yousef, Malik
    Chronic Granulomatous Disease (CGD) is a rare, inherited immunodeficiency disorder characterized by white blood cells unable to effectively kill certain bacteria and fungi. This defect results in the formation of clusters of immune cells called granulomas that form at sites of infection or inflammation. Therefore, identification of disease-related biomarkers is a critical step in advancing precision medicine and improving diagnostic accuracy. In this study, we applied a G-S-M machine learning approach to metabolomics data to uncover CGD-Associated biomarkers. We obtained a metabolomics dataset from Gene Expression Omnibus with GSE220260 accession number. Data includes 85 samples (16 healthy controls and 69 CGD samples) with comprehensive metabolic profiles obtained using liquid chromatography-mass spectrometry analysis. Dataset includes metabolite names with their ion type and formula. In order to identify CGD related metabolites and their ion types, G-S-M was used as a grouping function when performing machine learning oriented metabolomics data analysis. We have performed the G-S-M approach by grouping metabolites according to their ion type. In the training part of the G-S-M approach, metabolites annotated with selected ion types have been utilized to perform a two-class classification task which generates an important set of ion type output. We also compared the performance results of the G-S-M machine learning model with traditional feature selection methods; XGB, SKB, IG, FCBF, MRMR, CMIM with random forest classifier. 100 times Monte-Carlo Cross Validation was used in our experiments. It was observed that G-S-M, XGB, SKB and FCBF methods similarly provided the best performances. In this study, besides its performance, G-S-M method used groups based on ion types unlike TFS, and then identified relevant Chronic Granulomatous Disease-associated metabolites. © 2024 Elsevier B.V., All rights reserved.
  • Article
    Citation - Scopus: 4
    CCPred: Global and Population-Specific Colorectal Cancer Prediction and Metagenomic Biomarker Identification at Different Molecular Levels Using Machine Learning Techniques
    (Elsevier Ltd, 2024-11) Bakir-Güngör, Burcu; Temiz, Mustafa; Inal, Yasin; Cicekyurt, Emre; Yousef, Malik
    Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2′ 3′ cyclic 3′ phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED. © 2024 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 1
    Integrative Analyses in Omics Data: Machine Learning Perspective
    (Deutsche Gesellschaft fur Medizinische Informatik, Biometrie und Epidemiologie e.V., 2023) Ünlü Yazici, Miray; Bakir-Güngör, Burcu; Yousef, Malik; Yazici, Miray Unlu
    Developments in the high throughput technologies have enabled the production of an immense amount of knowledge at the multi-omics level. Considering complex diseases which are affected by multi-factors, single omics datasets might not be sufficient to unveil the molecular mechanisms of heterogeneous diseases. Providing a comprehensive and systematic overview to explain disease hallmarks in significant depth is critical. Utilizing multi-omics datasets has led to the development of a variety of tools and platforms. Machine learning models are utilized in a wide variety of tools to tackle the complexity of disorders and to identify new biomolecular signatures and potential markers. Underlying aspects of these approaches are based on training the models for making predictions and classification of the given data. In this review, we describe current machine learning-based approaches and available implementations. Challenges in the enlightenment of disease mechanisms of onset and progression and future development of the field of medicine will be discussed. The prominence of biological interpretation of model output with corresponding biological knowledge will be also covered in this review. © 2023 Elsevier B.V., All rights reserved.