Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 16
  • Conference Object
    Citation - Scopus: 2
    miRcorrNetPro: Unraveling Algorithmic Insights Through Cross-Validation in Multi-Omics Integration for Comprehensive Data Analysis
    (Institute of Electrical and Electronics Engineers Inc., 2023-12-05) Ünlü Yazici, Miray; Yousef, Malik; Marron, J. S.; Bakir-Güngör, Burcu; Yazici, Miray Unlu
    High throughput -omics technologies facilitate the investigation of regulatory mechanisms of complex diseases. Along this line, scientists develop promising tools and methods to extend our understanding at the molecular and functional levels. To this end, miRcorrNet tool performs integrative analysis of MicroRNA (miRNA) and gene expression profiles via machine learning (ML) approach to identify significant miRNA groups and their associated target genes. In this study, we propose miRcorrNetPro tool, which extends miRcorrNet by tracking group scoring, ranking and other information through the cross-validation iterations. Heatmap visualizations enable deep novel insights into the collective behavior of clusters of groups in cellular signaling and hence facilitate detection of potential biomarkers for the disease under investigation. Although miRcorrNetPro is designed as a generic tool, here we present our findings and potential miRNA biomarkers for Breast Cancer (BRCA). The miRcorrNetPro tool and all other supplementary files are available at https://github.com/Miray-Unlu/miRcorrNetPro. © 2024 Elsevier B.V., All rights reserved.
  • Article
    Citation - WoS: 6
    Citation - Scopus: 7
    The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease
    (IEEE Computer Soc, 2022-05-01) Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu
    Behcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.
  • Conference Object
    Citation - WoS: 16
    Citation - Scopus: 20
    Machine Learning Analysis of Inflammatory Bowel Disease-Associated Metagenomics Dataset
    (Institute of Electrical and Electronics Engineers Inc., 2018-09) Hacilar, Hilal; Nalbantoĝlu, Özkan Ufuk; Bakir-Güngör, Burcu
    There is an ongoing interplay between humans and our microbial communities. The microorganisms living in our gut produce energy from our food, strengthen our immune system, break down foreign products, and release metabolites and hormones, which are significant for regulating our physiology. The shifts away from this 'healthy' gut microbiome is considered to be associated with many diseases. Inflammatory bowel diseases (IBD) including Crohn's disease and ulcerative colitis, are gut related disorders affecting the intestinal tract. Although some metagenomics studies are conducted on IBD recently, our current understanding of the precise relationships between the human gut microbiome and IBD remains limited. In this regard, the use of state-of-the art machine learning approaches became popular to address a variety of questions like early diagnosis of certain diseases using human microbiota. In this study, we investigate which subset of gut microbiota are mostly associated with IBD and if disease-associated biomarkers can be detected via applying state-of-the art machine learning algorithms and proper feature selection methods. © 2019 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 5
    Identifying Taxonomic Biomarkers of Colorectal Cancer in Human Intestinal Microbiota Using Multiple Feature Selection Methods
    (Institute of Electrical and Electronics Engineers Inc., 2022-09-07) Jabeer, Amhar; Kocak, Aysegul; Akkaş, Huseyin; Yenisert, Ferhan; Nalbantoĝlu, Özkan Ufuk; Yousef, Malik; Bakir-Güngör, Burcu; Bakir Gungor, Burcu
    A variety of bacterial species called gut microbiota work together to maintain a steady intestinal environment. The gastrointestinal tract contains tremendous amount of different species including archaea, bacteria, fungi, and viruses. While these organisms are crucial immune system stabilizers, the dysbiosis of the intestinal flora has been related to gastrointestinal disorders including Colorectal cancer (CRC), intestinal cancer, irritable bowel syndrome and inflammatory bowel disease. In the last decade, next-generation sequencing (NGS) methods have accelerated the identification of human gut flora. CRC is a deathly condition that has been on the rise in the last century, affecting half a million people each year. Since early CRC diagnosis is critical for an effective treatment, there is an immediate requirement for a classification system that can expedite CRC diagnosis. In this study, via analyzing the available metagenomics data on CRC, we aim to facilitate the CRC diagnosis via finding biomarkers linked with CRC, and via building a classification model. We have obtained the metagenomic sequencing data of the healthy individuals and CRC patients from a metagenome-wide association analysis and we have classified this data according to the disease stages. Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), Extreme Gradient Boosting (XGBoost), min redundancy max relevance (mRMR), Information Gain (IG) and Select K Best (SKB) feature selection algorithms were utilized to cope with the complexity of the features. We observed that the SKB, IG, and XGBoost techniques made significant contributions to decrease the microbiota in use for CRC diagnosis, thereby reducing cost and time. We realized that our Random Forest classifier outperformed Adaboost, Support Vector Machine, Decision Tree, Logitboost and stacking ensemble classifiers in terms of CRC classification performance. Our results reiterated some known and some potential microbiome associated mechanisms in CRC, which could aid the design of new diagnostics based on the microbiome. © 2022 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - WoS: 22
    Citation - Scopus: 52
    Evaluation of Classification Algorithms, Linear Discriminant Analysis and a New Hybrid Feature Selection Methodology for the Diagnosis of Coronary Artery Disease
    (Institute of Electrical and Electronics Engineers Inc., 2018-12) Kolukisa, Burak; Hacilar, Hilal; Göy, Gökhan; Kus, Mustafa; Bakir-Güngör, Burcu; Aral, Atilla; Güngör, Vehbi Çağrı
    According to the World Health Organization (WHO), 31% of the world's total deaths in 2016 (17.9 million) was due to cardiovascular diseases (CVD). With the development of information technologies, it has become possible to predict whether people have heart diseases or not by checking certain physical and biochemical values at a lower cost. In this study, we have evalated a set of different classification algorithms, linear discriminant analysis and proposed a new hybrid feature selection methodology for the diagnosis of coronary heart diseases (CHD). Throughout this research effort, using three publicly available Heart Disease diagnosis datasets (UCI Machine Learning Repository), we have conducted comparative performance evaluations in terms of accuracy, sensitivity, specificity, F-measure, AUC and running time. © 2023 Elsevier B.V., All rights reserved.
  • Article
    Citation - Scopus: 8
    Building a Challenging Medical Dataset for Comparative Evaluation of Classifier Capabilities
    (Elsevier Ltd, 2024-08) Bozkurt, Berat; Coskun, Kerem; Bakal, Gokhan
    Since the 2000s, digitalization has been a crucial transformation in our lives. Nevertheless, digitalization brings a bulk of unstructured textual data to be processed, including articles, clinical records, web pages, and shared social media posts. As a critical analysis, the classification task classifies the given textual entities into correct categories. Categorizing documents from different domains is straightforward since the instances are unlikely to contain similar contexts. However, document classification in a single domain is more complicated due to sharing the same context. Thus, we aim to classify medical articles about four common cancer types (Leukemia, Non-Hodgkin Lymphoma, Bladder Cancer, and Thyroid Cancer) by constructing machine learning and deep learning models. We used 383,914 medical articles about four common cancer types collected by the PubMed API. To build classification models, we split the dataset into 70% as training, 20% as testing, and 10% as validation. We built widely used machine-learning (Logistic Regression, XGBoost, CatBoost, and Random Forest Classifiers) and modern deep-learning (convolutional neural networks - CNN, long short-term memory - LSTM, and gated recurrent unit - GRU) models. We computed the average classification performances (precision, recall, F-score) to evaluate the models over ten distinct dataset splits. The best-performing deep learning model(s) yielded a superior F1 score of 98%. However, traditional machine learning models also achieved reasonably high F1 scores, 95% for the worst-performing case. Ultimately, we constructed multiple models to classify articles, which compose a hard-to-classify dataset in the medical domain. © 2024 Elsevier B.V., All rights reserved.
  • Conference Object
    A Comparative Study on Psychiatric Disorders: Identification of Shared Pathways and Common Agents
    (Institute of Electrical and Electronics Engineers Inc., 2022-09-07) Kuzudisli, Cihan; Bakir-Güngör, Burcu; Bakir Gungor, Burcu
    Distinct but closely related diseases generally present shared symptoms, which address possible overlaps among their pathogenic mechanisms. Identification of significantly impacted shared pathways and other common agents are expected to elucidate etiology of these disorders and to help design better intervention strategies. In this research effort, we studied six psychiatric disorders including schizophrenia (SCZ), anorexia (AN), bipolar disorder (BD), depressive disorder (DD), autism (AU) and attention deficit hyperactivity disorder (ADHD). Our methodology can be classified into the following two parts: In Part I, common susceptibility genes; and in Part II, genome-wide association studies (GWAS) data were used to find enriched pathways of psychiatric disorders. 59 KEGG pathways were commonly identified in both parts. 31 of these pathways are disease pathways. Pathways related to cancer and infectious diseases were predominant compared to others. Most of the acquired pathways were in accordance with previous studies in literature. A combination of susceptibility genes and GWAS data is an effective approach to identify significantly impacted pathways in multifactorial diseases. In this respect, shared modules were determined after applying hierarchical clustering of the enriched pathways. These identified modules may tell us the association of psychiatric disorders with the enriched pathways. Taken all together, common pathways and shared modules are expected to highlight the causative factors and important mechanisms behind complex psychiatric diseases, leading to effective drug discovery. © 2022 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 7
    A Comparative Analysis on Medical Article Classification Using Text Mining & Machine Learning Algorithms
    (Institute of Electrical and Electronics Engineers Inc., 2021-09-15) Kolukisa, Burak; Dedeturk, Bilge Kagan; Dedeturk, Beyhan Adanur; Gulsen, Abdulkadir; Bakal, Gokhan; Guisen, Abdulkadir
    The document classification task is one of the widely studied research fields on multiple domains. The core motivation of the classification task is that the manual classification efforts are impractical due to the exponentially growing document volumes. Thus, we densely need to exploit automated computational approaches, such as machine learning models along with data & text mining techniques. In this study, we concentrated on the classification of medical articles specifically on common cancer types, due to the significance of the field and the decent number of available documents of interest. We deliberately targeted MEDLINE articles about common cancer types because most cancer types share a similar literature composition. Therefore, this situation makes the classification effort relatively more complicated. To this end, we built multiple machine learning models, including both traditional and deep learning architectures. We achieved the best performance (R¿82% F score) by the LSTM model. Overall, our results demonstrate a strong effect of exploiting both text mining and machine learning methods to distinguish medical articles on common cancer types. © 2022 Elsevier B.V., All rights reserved.
  • Conference Object
    Protein-Protein Etkilesim Ağlarinda Aktif Alt Ağ Arama Yöntemlerinin Performans Degerlendirmeleri
    (Institute of Electrical and Electronics Engineers Inc., 2019-09) Güner, Pinar; Bakir-Güngör, Burcu
    Protein-protein interaction networks are mathematical representations of the physical contacts between proteins in the cell. A group of interconnected proteins in a protein-protein interaction network that contains most of the disease associated proteins and some interacting other proteins is called an active subnetwork. Active subnetwork search is important to understand mechanisms underlying diseases. Active subnetworks are used to discover disease related regulatory pathways, functional modules and to classify diseases. In the literature there are many methods to search for active subnetworks. The purpose of this study is to compare the performance of different subnetwork identification methods. By using the Rheumatoid Arthritis dataset, the performances of greedy approach, genetic algorithm, simulated annealing algorithm, prize collecting steiner forest and game theory based subnetwork search methods are compared. © 2020 Elsevier B.V., All rights reserved.
  • Conference Object
    Population Specific Classification of Colorectal Cancer With Meta-Analysis of Metagenomic Data
    (Institute of Electrical and Electronics Engineers Inc., 2023-10-11) Temiz, Mustafa; Yousef, Malik; Bakir-Güngör, Burcu
    Advances in next-generation sequencing and '-omics' technologies makes it possible to characterize the human gut microbiome. While some of these microorganisms are important regulators of our immune system, modulation of the microbiota leads to a variety of diseases. Colorectal cancer (CRC), the third most common cancer worldwide, is caused by genetic mutations, environmental conditions, and abnormalities in the gut microbiota. Using various machine learning methods and meta-analysis techniques, this study aims to build a classification model that can help in CRC diagnosis by analyzing metagenomic datasets of different populations obtained at the species level. Using 8 different countries and 9 different metagenomic datasets, 3 different meta-analyzes are performed: within-population, cross-population, and one population is selected for testing and the rest is used as a training dataset (LODO). For CRC classification, 4 different classification algorithms (Random Forest (RF), Logitboost, Adaboost, and Decision Tree (DT)) are used. The best performance among these methods was obtained with the Random Forest algorithm with an AUC of 0.98 by using JP for the training data set and JPN populations for the test data set in the cross-population performance evaluation. © 2023 Elsevier B.V., All rights reserved.