1. Home
  2. Browse by Author

Browsing by Author "Yousef, Malik"

Filter results by typing the first few letters
Now showing 1 - 20 of 27
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    Article
    AMP-GSM: Prediction of Antimicrobial Peptides via a Grouping–Scoring–Modeling Approach
    (MDPI, 2023) Soylemez, Ummu Gulsum; Yousef, Malik; Bakir-Gungor, Burcu; 0000-0002-6602-772X; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Soylemez, Ummu Gulsum; Bakir-Gungor, Burcu
    Due to the increasing resistance of bacteria to antibiotics, scientists began seeking new solutions against this problem. One of the most promising solutions in this field are antimicrobial peptides (AMP). To identify antimicrobial peptides, and to aid the design and production of novel antimicrobial peptides, there is a growing interest in the development of computational prediction approaches, in parallel with the studies performing wet-lab experiments. The computational approaches aim to understand what controls antimicrobial activity from the perspective of machine learning, and to uncover the biological properties that define antimicrobial activity. Throughout this study, we aim to develop a novel prediction approach that can identify peptides with high antimicrobial activity against selected target bacteria. Along this line, we propose a novel method called AMP-GSM (antimicrobial peptide-grouping-scoring-modeling). AMP-GSM includes three main components: grouping, scoring, and modeling. The grouping component creates sub-datasets via placing the physicochemical, linguistic, sequence, and structure-based features into different groups. The scoring component gives a score for each group according to their ability to distinguish whether it is an antimicrobial peptide or not. As the final part of our method, the model built using the top-ranked groups is evaluated (modeling component). The method was tested for three AMP prediction datasets, and the prediction performance of AMP-GSM was comparatively evaluated with several feature selection methods and several classifiers. When we used 10 features (which are members of the physicochemical group), we obtained the highest area under curve (AUC) value for both the Gram-negative (99%) and Gram-positive (98%) datasets. AMP-GSM investigates the most significant feature groups that improve AMP prediction. A number of physico-chemical features from the AMP-GSM's final selection demonstrate how important these variables are in terms of defining peptide characteristics and how they should be taken into account when creating models to predict peptide activity.
  • Loading...
    Thumbnail Image
    Review
    Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data
    (MDPIST ALBAN-ANLAGE 66, CH-4052 BASEL, SWITZERLAND, 2021) Yousef, Malik; Kumar, Abhishek; Bakir-Gungor, Burcu; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu
    In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.
  • Loading...
    Thumbnail Image
    Article
    CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques
    (ELSEVIER, 2024) Bakir-Gungor, Burcu; Temiz, Mustafa; Inal, Yasin; Cicekyurt, Emre; Yousef, Malik; 0000-0002-2272-6270; 0000-0002-2839-1424; 0009-0002-4373-8526; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu; Temiz, Mustafa; Inal, Yasin; Cicekyurt, Emre
    Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2′ 3′ cyclic 3′ phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED.
  • Loading...
    Thumbnail Image
    Other
    Classification of Breast Cancer Molecular Subtypes with Grouping-Scoring-Modeling Approach that Incorporates Disease-Disease Association Information
    (IEEE Xplore, 2024) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu
    This study uses modern sequencing technology and large biological databases to investigate the molecular intricacies of complicated diseases like cancer. Using gene expression databases and biomarkers, the research aims to improve breast cancer molecular subtype identification for better patient outcomes. Using BRCA LumAB_ Her2Basal dataset, this study compares an integrative machine learning-based strategy (GediNET) to traditional feature selection approaches across machine learning classifiers. GediNET excels at uncovering crucial disease-disease connections and potential biomarkers using the Grouping-Scoring-Modeling (GSM) approach, which favors gene groupings above individual genes. Our comparative analysis highlights GediNET's exceptional performance, notably in terms of accuracy and Area Under the Curve metrics, underscoring its effectiveness in uncovering the genetic intricacies of breast cancer. GediNET's promise to improve disease classification and biomarker identification by improving biological mechanism understanding goes beyond exceeding traditional approaches. The work shows that GediNET's integrative method can promote bioinformatics research by identifying the most informative genes associated with certain diseases, enabling focused and customized medicine.
  • Loading...
    Thumbnail Image
    Article
    Computational Prediction of Functional MicroRNA-mRNA Interactions
    (HUMANA PRESS INC, 999 RIVERVIEW DR, STE 208, TOTOWA, NJ 07512-1165 USA, 01.01.2019) Demirci, Muserref Duygu Sacar; Yousef, Malik; Allmer, Jens; 0000-0003-2012-0598; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Biyomühendislik Bölümü
    Proteins have a strong influence on the phenotype and their aberrant expression leads to diseases. MicroRNAs (miRNAs) are short RNA sequences which posttranscriptionally regulate protein expression. This regulation is driven by miRNAs acting as recognition sequences for their target mRNAs within a larger regulatory machinery. A miRNA can have many target mRNAs and an mRNA can be targeted by many miRNAs which makes it difficult to experimentally discover all miRNA-mRNA interactions. Therefore, computational methods have been developed for miRNA detection and miRNA target prediction. An abundance of available computational tools makes selection difficult. Additionally, interactions are not currently the focus of investigation although they more accurately define the regulation than pre-miRNA detection or target prediction could perform alone. We define an interaction including the miRNA source and the mRNA target. We present computational methods allowing the investigation of these interactions as well as how they can be used to extend regulatory pathways. Finally, we present a list of points that should be taken into account when investigating miRNA-mRNA interactions. In the future, this may lead to better understanding of functional interactions which may pave the way for disease marker discovery and design of miRNA-based drugs.
  • Loading...
    Thumbnail Image
    Article
    Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
    (FRONTIERS MEDIA SAAVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE CH-1015, SWITZERLAND, 2021) Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, Amhar; Nalbantoglu, O. Ufuk; Yousef, Malik; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, Amhar
    Human gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.
  • Loading...
    Thumbnail Image
    conferenceobject.listelement.badge
    The Effect of Different Classifiers on Recursive Cluster Elimination in the Analysis of Transcriptomic Data
    (Institute of Electrical and Electronics Engineers Inc., 2023) Bulut, Nurten; Bakir-Gungor, Burcu; Qaqish, Bahjat F.; Yousef, Malik; 0000-0002-1895-8749; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bulut, Nurten; Bakir-Gungor, Burcu
    Gene expression data with limited sample size and a large number of genes are frequently encountered in genetic studies. In such high-dimensional data, identification of genes that distinguish between disease states is a challenging task. Feature selection (FS) is a useful approach in dealing with high dimensionality. Support Vector Machines Recursive Cluster Elimination (SVM-RCE) is a technique for FS in highdimensional data. The SVM-RCE approach has been utilized for identification of clusters of genes whose expression levels correlate with pathological state. A key step in SVM-RCE is the use of an SVM classifier to assign an area under the curve (AUC) score to each gene cluster based on its ability to predict class labels. In this study, we investigate the use of alternative classifiers in the cluster-scoring step. Specifically, we compare Support Vector Machines, Random Forest, XgBoost, Naive Bayes, and linear logistic regression. In addition to AUC score performance evaluation, the algorithms are compared in terms of the number of selected genes at different levels of clustering and in terms of the running time.
  • Loading...
    Thumbnail Image
    Article
    GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
    (FRONTIERS MEDIA SA, 2023) Ersoz, Nur Sebnem; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0003-3343-9936; 0000-0002-2272-6270; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Moleküler Biyoloji ve Genetik Bölümü; Ersoz, Nur Sebnem; Bakir-Gungor, Burcu
    Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product.Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype.Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model.Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.
  • Loading...
    Thumbnail Image
    conferenceobject.listelement.badge
    Identifying Taxonomic Biomarkers of Colorectal Cancer in Human Intestinal Microbiota Using Multiple Feature Selection Methods
    (Institute of Electrical and Electronics Engineers Inc., 2022) Jabeer, Amhar; Kocak, Aysegul; Akkas, Huseyin; Yenisert, Ferhan; Nalbantoglu, Ozkan Ufuk; Yousef, Malik; Bakir Gungor, Burcu; 0000-0002-6367-7823; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Jabeer, Amhar; Kocak, Aysegul; Akkas, Huseyin; Yenisert, Ferhan; Bakir Gungor, Burcu
    A variety of bacterial species called gut microbiota work together to maintain a steady intestinal environment. The gastrointestinal tract contains tremendous amount of different species including archaea, bacteria, fungi, and viruses. While these organisms are crucial immune system stabilizers, the dysbiosis of the intestinal flora has been related to gastrointestinal disorders including Colorectal cancer (CRC), intestinal cancer, irritable bowel syndrome and inflammatory bowel disease. In the last decade, next-generation sequencing (NGS) methods have accelerated the identification of human gut flora. CRC is a deathly condition that has been on the rise in the last century, affecting half a million people each year. Since early CRC diagnosis is critical for an effective treatment, there is an immediate requirement for a classification system that can expedite CRC diagnosis. In this study, via analyzing the available metagenomics data on CRC, we aim to facilitate the CRC diagnosis via finding biomarkers linked with CRC, and via building a classification model. We have obtained the metagenomic sequencing data of the healthy individuals and CRC patients from a metagenome-wide association analysis and we have classified this data according to the disease stages. Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), Extreme Gradient Boosting (XGBoost), min redundancy max relevance (mRMR), Information Gain (IG) and Select K Best (SKB) feature selection algorithms were utilized to cope with the complexity of the features. We observed that the SKB, IG, and XGBoost techniques made significant contributions to decrease the microbiota in use for CRC diagnosis, thereby reducing cost and time. We realized that our Random Forest classifier outperformed Adaboost, Support Vector Machine, Decision Tree, Logitboost and stacking ensemble classifiers in terms of CRC classification performance. Our results reiterated some known and some potential microbiome associated mechanisms in CRC, which could aid the design of new diagnostics based on the microbiome.
  • Loading...
    Thumbnail Image
    Article
    Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods
    (PEERJ INC, 2022) Bakır Güngör, Burcu; Hacılar, Hilal; Jabeer, Amhar; Nalbantoğlu, Özkan Ufuk; Aran, Oya; Yousef, Malik; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakır Güngör, Burcu
    The tremendous boost in next generation sequencing and in the “omics” technologies makes it possible to characterize the human gut microbiome—the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn’s disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.
  • Loading...
    Thumbnail Image
    Article
    Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data
    (MDPI, 2025) Bakir-Gungor, Burcu; Ersoz, Nur Sebnem; Yousef, Malik; 0000-0002-2272-6270; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Moleküler Biyoloji ve Genetik Bölümü; Bakir-Gungor, Burcu; Ersoz, Nur Sebnem
    Advances in metagenomics have revolutionized our ability to elucidate links between the microbiome and human diseases. Colorectal cancer (CRC), a leading cause of cancer-related mortality worldwide, has been associated with dysbiosis of the gut microbiome. This study aims to develop a method for identifying CRC-associated microbial enzymes by incorporating biological domain knowledge into the feature selection process. Conventional feature selection techniques often evaluate features individually and fail to leverage biological knowledge during metagenomic data analysis. To address this gap, we propose the enzyme commission (EC)-nomenclature-based Grouping-Scoring-Modeling (G-S-M) method, which integrates biological domain knowledge into feature grouping and selection. The proposed method was tested on a CRC-associated metagenomic dataset collected from eight different countries. Community-level relative abundance values of enzymes were considered as features and grouped based on their EC categories to provide biologically informed groupings. Our findings in randomized 10-fold cross-validation experiments imply that glycosidases, CoA-transferases, hydro-lyases, oligo-1,6-glucosidase, crotonobetainyl-CoA hydratase, and citrate CoA-transferase enzymes can be associated with CRC development as part of different molecular pathways. These enzymes are mostly synthesized by Eschericia coli, Salmonella enterica, Klebsiella pneumoniae, Staphylococcus aureus, Streptococcus pneumoniae, and Clostridioides dificile. Comparative evaluation experiments showed that the proposed model consistently outperforms traditional feature selection methods paired with various classifiers.
  • Loading...
    Thumbnail Image
    bookpart.listelement.badge
    Integrating Gene Ontology Based Grouping and Ranking into the Machine Learning Algorithm for Gene Expression Data Analysis
    (SPRINGER INTERNATIONAL PUBLISHING AGGEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND, 2021) Yousef, Malik; Sayici, Ahmet; Bakir-Gungor, Burcu; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Sayici, Ahmet; Bakir-Gungor, Burcu
    Recent advances in the high throughput technologies resulted in the production of large gene expression data sets for several phenotypes. Via comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc., one could identify biomarkers. As opposed to traditional gene selection approaches, integrative gene selection approaches incorporate domain knowledge from external biological resources during gene selection, which improves interpretability and predictive performance. In this respect, Gene Ontology provides cellular component, molecular function and biological process terms for the products of each gene. In this study, we present Gene Ontology based feature selection approach for gene expression data analysis. In our approach, we used the ontology information as grouping (term) information and embedded this information into a machine learning algorithm for selecting the most significant groups (terms) of ontology. Those groups are used to build the machine learning model in order to perform the classification task. The output of the tool is a significant ontology group for the task of 2-class classification applied on the gene expression data. This knowledge allows the researcher to perform more advanced gene expression analyses. We tested our approach on 8 different gene expression datasets. In our experiments, we observed that the tool successfully found the significant Ontology terms that would be used as a classification model. We believe that our tool will help the geneticists to identify affected genes in transcriptomic data and this information could enable the design of platforms to assist diagnosis, to assess patients' prognoses, and to create patient treatment plans.
  • Loading...
    Thumbnail Image
    conferenceobject.listelement.badge
    Integrative analyses in omics data: Machine learning perspective
    (Deutsche Gesellschaft fur Medizinische Informatik, Biometrie und Epidemiologie e.V., 2023) Yazici, Miray Unlu; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0001-8165-6164; 0000-0002-2272-6270; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Biyomühendislik Bölümü; Yazici, Miray Unlu; Bakir-Gungor, Burcu
    Developments in the high throughput technologies have enabled the production of an immense amount of knowledge at the multi-omics level. Considering complex diseases which are affected by multi-factors, single omics datasets might not be sufficient to unveil the molecular mechanisms of heterogeneous diseases. Providing a comprehensive and systematic overview to explain disease hallmarks in significant depth is critical. Utilizing multi-omics datasets has led to the development of a variety of tools and platforms. Machine learning models are utilized in a wide variety of tools to tackle the complexity of disorders and to identify new biomolecular signatures and potential markers. Underlying aspects of these approaches are based on training the models for making predictions and classification of the given data. In this review, we describe current machine learning-based approaches and available implementations. Challenges in the enlightenment of disease mechanisms of onset and progression and future development of the field of medicine will be discussed. The prominence of biological interpretation of model output with corresponding biological knowledge will be also covered in this review.
  • Loading...
    Thumbnail Image
    Article
    Invention of 3Mint for feature grouping and scoring in multi-omics
    (FRONTIERS MEDIA SA, 2023) Yazici, Miray Unlu; Marron, J. S.; Bakir-Gungor, Burcu; Zou, Fei; Yousef, Malik; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu
    Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at .
  • Loading...
    Thumbnail Image
    Article
    miRcorrNet: machine learning-based integration of miRNA and mRNA expression profiles, combined with feature grouping and ranking
    (PEERJ INC341-345 OLD ST, THIRD FLR, LONDON EC1V 9LL, ENGLAND, 2021) Yousef, Malik; Goy, Gokhan; Mitra, Ramkrishna; Eischen, Christine M.; Jabeer, Amhar; Bakir-Gungor, Burcu; 0000-0002-2272-6270; 0000-0001-7678-0355; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Goy, Gokhan; Jabeer, Amhar; Bakir-Gungor, Burcu
    A better understanding of disease development and progression mechanisms at the molecular level is critical both for the diagnosis of a disease and for the development of therapeutic approaches. The advancements in high throughput technologies allowed to generate mRNA and microRNA (miRNA) expression profiles; and the integrative analysis of these profiles allowed to uncover the functional effects of RNA expression in complex diseases, such as cancer. Several researches attempt to integrate miRNA and mRNA expression profiles using statistical methods such as Pearson correlation, and then combine it with enrichment analysis. In this study, we developed a novel tool called miRcorrNet, which performs machine learning-based integration to analyze miRNA and mRNA gene expression profiles. miRcorrNet groups mRNAs based on their correlation to miRNA expression levels and hence it generates groups of target genes associated with each miRNA. Then, these groups are subject to a rank function for classification. We have evaluated our tool using miRNA and mRNA expression profiling data downloaded from The Cancer Genome Atlas (TCGA), and performed comparative evaluation with existing tools. In our experiments we show that miRcorrNet performs as good as other tools in terms of accuracy (reaching more than 95% AUC value). Additionally, miRcorrNet includes ranking steps to separate two classes, namely case and control, which is not available in other tools. We have also evaluated the performance of miRcorrNet using a completely independent dataset. Moreover, we conducted a comprehensive literature search to explore the biological functions of the identified miRNAs. We have validated our significantly identified miRNA groups against known databases, which yielded about 90% accuracy. Our results suggest that miRcorrNet is able to accurately prioritize pan-cancer regulating high-confidence miRNAs. miRcorrNet tool and all other supplementary files are available at https://github.com/ malikyousef/miRcorrNet.
  • Loading...
    Thumbnail Image
    conferenceobject.listelement.badge
    miRcorrNetPro: Unraveling Algorithmic Insights through Cross-Validation in Multi-Omics Integration for Comprehensive Data Analysis
    (Institute of Electrical and Electronics Engineers Inc., 2023) Yazici, Miray Unlu; Yousef, Malik; Marron J.S.; Bakir-Gungor, Burcu; 0000-0001-8165-6164; 0000-0002-2272-6270; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Biyomühendislik Bölümü; Yazici, Miray Unlu; Bakir-Gungor, Burcu
    High throughput -omics technologies facilitate the investigation of regulatory mechanisms of complex diseases. Along this line, scientists develop promising tools and methods to extend our understanding at the molecular and functional levels. To this end, miRcorrNet tool performs integrative analysis of microRNA (miRNA) and gene expression profiles via machine learning (ML) approach to identify significant miRNA groups and their associated target genes. In this study, we propose miRcorrNetPro tool, which extends miRcorrNet by tracking group scoring, ranking and other information through the cross-validation iterations. Heatmap visualizations enable deep novel insights into the collective behavior of clusters of groups in cellular signaling and hence facilitate detection of potential biomarkers for the disease under investigation. Although miRcorrNetPro is designed as a generic tool, here we present our findings and potential miRNA biomarkers for Breast Cancer (BRCA). The miRcorrNetPro tool and all other supplementary files are available at https://github.com/MirayUnlu/miRcorrNetPro.
  • Loading...
    Thumbnail Image
    Article
    miRdisNET: Discovering microRNA biomarkers that are associated with diseases utilizing biological knowledge-based machine learning
    (Frontiers Media S.A., 2023) Jabeer, Amhar; Temiz, Mustafa; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Jabeer, Amhar; Temiz, Mustafa; Bakir-Gungor, Burcu
    During recent years, biological experiments and increasing evidence have shown that microRNAs play an important role in the diagnosis and treatment of human complex diseases. Therefore, to diagnose and treat human complex diseases, it is necessary to reveal the associations between a specific disease and related miRNAs. Although current computational models based on machine learning attempt to determine miRNA-disease associations, the accuracy of these models need to be improved, and candidate miRNA-disease relations need to be evaluated from a biological perspective. In this paper, we propose a computational model named miRdisNET to predict potential miRNA-disease associations. Specifically, miRdisNET requires two types of data, i.e., miRNA expression profiles and known disease-miRNA associations as input files. First, we generate subsets of specific diseases by applying the grouping component. These subsets contain miRNA expressions with class labels associated with each specific disease. Then, we assign an importance score to each group by using a machine learning method for classification. Finally, we apply a modeling component and obtain outputs. One of the most important outputs of miRdisNET is the performance of miRNA-disease prediction. Compared with the existing methods, miRdisNET obtained the highest AUC value of.9998. Another output of miRdisNET is a list of significant miRNAs for disease under study. The miRNAs identified by miRdisNET are validated via referring to the gold-standard databases which hold information on experimentally verified microRNA-disease associations. miRdisNET has been developed to predict candidate miRNAs for new diseases, where miRNA-disease relation is not yet known. In addition, miRdisNET presents candidate disease-disease associations based on shared miRNA knowledge. The miRdisNET tool and other supplementary files are publicly available at: https://github.com/malikyousef/miRdisNET.
  • Loading...
    Thumbnail Image
    Article
    miRModuleNet: Detecting miRNA-mRNA Regulatory Modules
    (RONTIERS MEDIA SAAVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE CH-1015, SWITZERLAND, 2022) Yousef, Malik; Goy, Gokhan; Bakır Güngör, Burcu; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakır Güngör, Burcu; Göy, Gökhan
    Increasing evidence that microRNAs (miRNAs) play a key role in carcinogenesis has revealed the need for elucidating the mechanisms of miRNA regulation and the roles of miRNAs in gene-regulatory networks. A better understanding of the interactions between miRNAs and their mRNA targets will provide a better understanding of the complex biological processes that occur during carcinogenesis. Increased efforts to reveal these interactions have led to the development of a variety of tools to detect and understand these interactions. We have recently described a machine learning approach miRcorrNet, based on grouping and scoring (ranking) groups of genes, where each group is associated with a miRNA and the group members are genes with expression patterns that are correlated with this specific miRNA. The miRcorrNet tool requires two types of -omics data, miRNA and mRNA expression profiles, as an input file. In this study we describe miRModuleNet, which groups mRNA (genes) that are correlated with each miRNA to form a star shape, which we identify as a miRNA-mRNA regulatory module. A scoring procedure is then applied to each module to further assess their contribution in terms of classification. An important output of miRModuleNet is that it provides a hierarchical list of significant miRNA-mRNA regulatory modules. miRModuleNet was further validated on external datasets for their disease associations, and functional enrichment analysis was also performed. The application of miRModuleNet aids the identification of functional relationships between significant biomarkers and reveals essential pathways involved in cancer pathogenesis.
  • Loading...
    Thumbnail Image
    conferenceobject.listelement.badge
    Population Specific Classification of Colorectal Cancer with Meta-Analysis of Metagenomic Data
    (Institute of Electrical and Electronics Engineers Inc., 2023) Temiz, Mustafa; Yousef, Malik; Bakir-Gungor, Burcu; 0000-0002-2272-6270; 0000-0002-2839-1424; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu; Temiz, Mustafa
    Yeni nesil dizilemedeki ve "-omik" teknolojilerdeki gelişmeler, insan bağırsak mikrobiyomunu karakterize etmeyi mümkün kılmaktadır. Bu mikroorganizmaların bazıları bağışıklık sistemimizin temel düzenleyicileriyken, mikrobiyotanın modülasyonu çeşitli hastalıklara yol açar. Dünya çapında üçüncü yaygın kanser türü olan kolorektal kanser (KRK), genetik mutasyonlar, çevresel koşullar ve bağırsak mikrobiyotasındaki anomalilerin etkisiyle oluşmaktadır. Bu çalışma, tür seviyesinde metagenomik veri setleri üzerinde çeşitli makine öğrenmesi yöntemleri kullanarak farklı popülasyonlar için meta-analiz gerçekleştirmeyi; bu sayede KRK teşhisine yardımcı olabilecek sınıflandırma modelleri oluşturmayı amaçlamaktadır. Bu çalışmada, 8 farklı ülke ve 9 farklı metagenomik veri seti üzerinde popülasyon içi, popülasyonlar arası ve leave one dataset out (LODO) yöntemi kullanılarak 3 farklı meta-analiz gerçekleştirilmiştir. KRK teşhisine yardımcı model geliştirirken 4 farklı sınıflandırma algoritması (Rastgele Orman (RF), Logitboost, Adaboost ve Karar Agaci (DT)) kullanılmaktadır. Yapılan deneylerde en üstün performans olarak, popülasyonlar arası performans değerlendirmesinde eğitim veri seti için JP ve test veri seti için JPN popülasyonları kullanıldığında Random Forest algoritması ile 0.98 AUC elde etmiştir.
  • Loading...
    Thumbnail Image
    Article
    Prediction of Linear Cationic Antimicrobial Peptides Active against Gram-Negative and Gram-Positive Bacteria Based on Machine Learning Models
    (MDPI, 2022) Bakır Güngör, Burcu; Söylemez, Ümmü Gülsüm; Yousef, Malik; Kesmen, Zulal; Büyükkiraz, Mine Erdem; ABC-1093-2021; 0000-0001-8780-6303; 0000-0002-4505-6871; 0000-0002-6602-772X; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakır Güngör, Burcu
    Antimicrobial peptides (AMPs) are considered as promising alternatives to conventional antibiotics in order to overcome the growing problems of antibiotic resistance. Computational prediction approaches receive an increasing interest to identify and design the best candidate AMPs prior to the in vitro tests. In this study, we focused on the linear cationic peptides with non-hemolytic activity, which are downloaded from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). Referring to the MIC (Minimum inhibition concentration) values, we have assigned a positive label to a peptide if it shows antimicrobial activity; otherwise, the peptide is labeled as negative. Here, we focused on the peptides showing antimicrobial activity against Gram-negative and against Gram-positive bacteria separately, and we created two datasets accordingly. Ten different physico-chemical properties of the peptides are calculated and used as features in our study. Following data exploration and data preprocessing steps, a variety of classification algorithms are used with 100-fold Monte Carlo Cross-Validation to build models and to predict the antimicrobial activity of the peptides. Among the generated models, Random Forest has resulted in the best performance metrics for both Gram-negative dataset (Accuracy: 0.98, Recall: 0.99, Specificity: 0.97, Precision: 0.97, AUC: 0.99, F1: 0.98) and Gram-positive dataset (Accuracy: 0.95, Recall: 0.95, Specificity: 0.95, Precision: 0.90, AUC: 0.97, F1: 0.92) after outlier elimination is applied. This prediction approach might be useful to evaluate the antibacterial potential of a candidate peptide sequence before moving to the experimental studies.
  • «
  • 1 (current)
  • 2
  • »