WoS İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/394

Browse

Search Results

Now showing 1 - 10 of 10
  • Conference Object
    Enhancing Complex Disease Group Scoring with Mirgedinet: A Multi-Algorithm Machine Learning Framework Based on the GSM Approach
    (IEEE, 2025-06-25) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik
    Integrating biological prior knowledge for disease gene associations has shown significant promise in discovering new biomarkers with potential translational applications. This work investigates the application of a multi-algorithm machine learning framework based on the Grouping-Scoring-Modeling (G-S-M) approach for improving the prediction of complex diseases. The study identifies the primary gene and miRNA interactions in various complex diseases with the help of miRGediNET, which is a machine-learning based tool that integrates data from three biological databases. Traditional methods have only focused on independence between features; the G-S-M method focuses on aggregating genes based on biological interactions, pinpointing the scoring of gene groups for a disease, and modeling its predictive capability using advanced machine learning algorithms. In this research paper, seven algorithms, including Support Vector Machine, Decision Tree, and CatBoost, were applied to eight datasets extracted from the GEO database. This framework proved very robust in ranking gene clusters, thus predicting critical biomarkers while doing 100-fold randomized cross-validation within the evaluation. The results indicate this approach's high potential for refining disease and supporting research for choosing the best algorithm that can provide biological insights and computational advances.
  • Conference Object
    Exploring Microbiome Signatures in Autism Spectrum Disorder via Grouping-Scoring Based Machine Learning
    (IEEE, 2025-06-25) Temiz, Mustafa; Ersoz, Nur Sebnem; Yousef, Malik; Bakir-Gungor, Burcu
    The rapid increase in omic data production increased the importance of machine learning (ML) methods to analze these data. In particular, the use of metagenomic data in the diagnosis, prognosis and treatment of diseases is becoming widespread. Autism Spectrum Disorder (ASD) is a neurodevelopmental disease that occurs in early childhood and continues lifelong. The aim of this study is to increase ML performance, reduce computational costs and achieve successful classification performance using a small number of metagenomic features. In addition, disease prediction is performed; ASD associated biomarkers are determined using the microBiomeGSM on metagenomic data. Classification is performed at three different taxonomic levels (genus, family and order) using the relative abundance values of species. The best performance metric (0.95 AUC) was obtained at the order taxonomic level using an average of 416 features with microBiomeGSM. The identified ASD-related taxonomic species are presented.
  • Conference Object
    Citation - WoS: 1
    Citation - Scopus: 1
    Prediction of Type 2 Diabetes Using Metagenomic Data and Identification of Taxonomic Biomarkers
    (IEEE, 2024-05-15) Temiz, Mustafa; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu
    Nowadays, different molecular levels of -omics data on diseases are generated and analyzing these data with machine learning methods is one of the popular research topics. Among these data, the use of metagenomic data to facilitate the diagnosis, detection and treatment of diseases is increasing day by day. Type 2 diabetes (T2D) is a chronic disease characterized by insulin resistance and progressive dysfunction of pancreatic beta cells. While the number of people with diabetes is increasing by around 8% annually, the cost of treating the disease is rising by 18% per year. Therefore, the number of studies on the diagnosis, development and progression of T2D is increasing over time. The aim of this study is to achieve higher machine learning performance by using fewer metagenomic features and to achieve better classification performance by reducing computational costs. In this study, we compare the performance of three different methods using T2D-related metagenomic data. First, the MetaPhlAn tool is used to calculate the taxonomic species and their relative abundances in each sample. The SVM-RCE, RCE-IFE and microBiomeGSM tools used in this study are methods that perform classification by grouping and scoring features and are known to work well on complex datasets. In this study, the best results were obtained with the RCE-IFE tool with an AUC of 0.72 with an average of 125 features information. In addition, key taxonomic species identified by these tools as associated with T2D are presented in comparison to the literature.
  • Conference Object
    Metagenomic Data Analysis With Machine Learning to Discover Colorectal Cancer-Associated Enzymes
    (IEEE, 2024-05-15) Ersoz, Nur Sebnem; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu
    The human gut microbiome comprises over 10 trillion microbes and plays important roles in maintaining metabolism, body homeostasis, impacting immune function. Metagenomics which studies genomic data from clinical and environmental samples is crucial in understanding the interplay between the host and the gut microbiome. Recently, functional profiling of metagenomes helps to identify alterations in microbial functions, particularly enzyme-encoding genes. Colorectal cancer (CRC) is known as one of the leading causes of cancer-related deaths. In this study, we aimed to find the CRC-associated enzymes by analyzing metagenomic data with different machine learning methods. A total of 1262 samples including CRC and control groups from different countries were used in this study. This dataset was obtained by functionally profiling metagenomics data and estimating community level enzyme commission (EC) abundance values. For the analysis of this dataset, RCE-IFE and SVM-RCE machine learning methods, which are group-based feature selection methods, were compared with 6 different individual feature selection methods. 10 times Monte-Carlo Cross Validation was used in our experiments. It was observed that RCE-IFE, Extreme Gradient Boosting and Select K Best methods similarly provided the best performances. Especially in this study, besides the its high performance, the group-based feature selection method RCE-IFE grouped enzymes into clusters unlike TFS, and then identified biologically relevant CRC-associated enzymes.
  • Conference Object
    Graph-Based Biomedical Knowledge Discovery
    (IEEE, 2024-05-15) Altuner, Osman; Bakir-Gungor, Burcu; Bakal, Gokhan
    The digitalization process is progressing at a very high speed all over the world. While this situation provides many conveniences in today's life, it also brings along a problem such as analyzing and processing the huge digital data. This also applies to published academic studies. In this sense, the process of evaluating each study to access previously unknown information within the studies requires a very laborious process. For this reason, in this study, the publications obtained for the target diseases were analyzed by text analysis processes and converted into a graph structure that enables the linking of meaningful terms through biomedical relationships. On the dense graph structure obtained, binary biomedical entities with important links such as treats, causes, associated_with were queried. The entity pairs obtained according to the query results were also confirmed by manual search method and proved to be real connections. In this study, retrieval of known biomedical entities with the proposed approach solved the time-consuming manual search problem. There is also the potential to obtain unknown/unexplored possible new relationships (e.g., therapeutic, causal, etc.) with multiple binary linking patterns.
  • Conference Object
    Citation - WoS: 2
    Citation - Scopus: 2
    Classification of Breast Cancer Molecular Subtypes With Grouping-Scoring Approach That Incorporates Disease-Disease Association Information
    (IEEE, 2024-05-15) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik
    This study uses modern sequencing technology and large biological databases to investigate the molecular intricacies of complicated diseases like cancer. Using gene expression databases and biomarkers, the research aims to improve breast cancer molecular subtype identification for better patient outcomes. Using BRCA LumAB_ Her2Basal dataset, this study compares an integrative machine learning-based strategy (GediNET) to traditional feature selection approaches across machine learning classifiers. GediNET excels at uncovering crucial disease-disease connections and potential biomarkers using the Grouping-Scoring-Modeling (GSM) approach, which favors gene groupings above individual genes. Our comparative analysis highlights GediNET's exceptional performance, notably in terms of accuracy and Area Under the Curve metrics, underscoring its effectiveness in uncovering the genetic intricacies of breast cancer. GediNET's promise to improve disease classification and biomarker identification by improving biological mechanism understanding goes beyond exceeding traditional approaches. The work shows that GediNET's integrative method can promote bioinformatics research by identifying the most informative genes associated with certain diseases, enabling focused and customized medicine.
  • Conference Object
    Citation - WoS: 4
    Blockchain-Based Fog Computing Applications in Healthcare
    (IEEE, 2020-10-05) Adanur, Beyhan; Bakir-Gungor, Burcu; Soran, Ahmet
    Recently, the use of blockchain technology in the field of healthcare has increased. Although blockchain technology brought several innovations to healthcare, still there are problems waiting to be resolved. In order to provide alternative solutions to these problems, the use of fog computing together with blockchain technology has been proposed. In this study, the applications of blockchain based fog computing technology in healthcare are investigated. The aim of this study is to provide the readers an idea about the interactive use of blockchain and fog computing in the field of healthcare. For this purpose, firstly, fog computing and blockchain technologies are introduced. Afterwards, the integration of these areas, the advantages and disadvantages of using these technologies in the field of healthcare is discussed and a new system architecture is proposed.
  • Conference Object
    Citation - WoS: 1
    Citation - Scopus: 1
    The Identification of Discriminative Single Nucleotide Polymorphism Sets for the Classification of Behcet's Disease
    (IEEE, 2018-09) Gormez, Yasin; Isik, Yunus Emre; Bakir-Gungor, Burcu
    Behcet's disease is a long-term multisystem inflammatory disorder, characterized by recurrent attacks affecting several organs. As the genotyping individuals get cheaper and easier following the developments in genomic technologies, genome-wide association studies (GWAS) emerged. By this means, via studying big-sized case-control groups for a specific disease, potential genetic variations, single nucleotide polymorphisms (SNPs) are identified. Although several genetic risk factors are identified for Behcet's disease with the help of these studies via scanning around a million of SNPs, these variations could only explain up to 200/u of the disease's genetic risk. In this study, for Behcet's disease classification, via comparing all the SNPs genotyped in GWAS, with the SNPs selected via using genetic knowledge, gain ratio and information gain; both reduction in the feature size and improvement in the classification accuracy is aimed. Also, using different classification algorithms such as random forest, k-nearest neighbour and logistic regression, their effects on the classification accuracy are investigated. Our results showed that compared to other feature selection methods, with at least 81% success rate, the selection of the SNPs using the genetic information (of their GWAS p-values, indicating the significance of the SNP against the disease) provides 15% to 42% improvement in all classification algorithms. This improvement is statistically sound. While gain ratio and information gain feature selection techniques yield similar classification accuracies, the models using all SNPs could not exceed 50% accuracies and results in the worst performance.
  • Conference Object
    Performance Evaluations of Active Subnetwork Search Methods in Protein-Protein Interaction Networks
    (IEEE, 2019) Gunter, Pinar; Bakir-Gungor, Burcu
    Protein-protein interaction networks are mathematical representations of the physical contacts between proteins in the cell. A group of interconnected proteins in a protein-protein interaction network that contains most of the disease associated proteins and some interacting other proteins is called an active subnetwork. Active subnetwork search is important to understand mechanisms underlying diseases. Active subnetworks are used to discover disease related regulatory pathways, functional modules and to classify diseases. In the literature there arc many methods to search for active subnetworks. The purpose of this study is to compare the performance of different subnetwork identification methods. By using the Rheumatoid Arthritis dataset, the performances of greedy approach, genetic algorithm, simulated annealing algorithm, prize collecting steiner forest and game theory based subnetwork search methods are compared.
  • Conference Object
    Tip 2 Diyabet'te Etkilenen Yolak Alt Ağlarını Bulmak İçin Yukarıdan Aşağıya İşleyen Bir Yaklaşım
    (IEEE, 2020-10-05) Unlu Yazici, Miray; Bakir-Gungor, Burcu
    Diabetes Mellitus (DM) is a metabolic disorder caused by dysfunction of insulin-producing pancreatic beta cells, insulin resistance, or impairment of insulin functionality. Type 2 Diabetes Mellitus (T2D) is a complex multifactorial disease that accounts for 90% of diabetes cases. In recent years, genome-wide association studies (GWAS) have successfully identified genetic variants associated with T2D risk. However, while conventional GWAS analyses focus on 'the tip of the iceberg' single nucleotide polymorphisms (SNPs), new analysis methods are needed to uncover hidden variations in these studies. In our previous study, we developed a post-GWAS analysis methodology to find disease-associated marker pathways by integrating human protein-protein interaction network, known biological pathways and potential SNPs. In this study, via adding different in-silico approaches to our methodology, we aim to identify affected pathway subnetworks and affected pathway clusters in addition to the affected protein subnetworks in T2D, and consequently to enlighten molecular mechanisms of T2D. Using this proposed method, we analyzed T2D GWAS meta-analysis data including 12.931 cases ye 57.196 controls. The approach we presented here is based on both the significance value of affected pathway and its topological relationship with other neighbor pathways. In the functional enrichment stage of our method, important pathways were obtained using hypergeometric test and gene-pathway matrix was formed. Then pathway-pathway similarity values were calculated using Jaccard index. Using the scores obtained in the similarity matrix, pathway-pathway network was constructed, and disease-related pathway modules were obtained using subnetwork search algorithms. As a result, genes, pathways and pathway subnetworks that might have a potential role in T2D development were identified, and the categories and classes that are related with these affected pathways were determined.