PubMed İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/397
Browse
11 results
Search Results
Correction Correction: Engineering Novel Features for Diabetes Complication Prediction Using Synthetic Electronic Health Records(Frontiers Media S.A., 2025-08-29) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, MalikArticle Citation - WoS: 26Citation - Scopus: 33miRmoduleNet: Detecting miRNA-mRNA Regulatory Modules(Frontiers Media S.A., 2022-04-12) Yousef, Malik; Goy, Gokhan; Bakir-Gungor, BurcuIncreasing evidence that MicroRNAs (miRNAs) play a key role in carcinogenesis has revealed the need for elucidating the mechanisms of miRNA regulation and the roles of miRNAs in gene-regulatory networks. A better understanding of the interactions between miRNAs and their mRNA targets will provide a better understanding of the complex biological processes that occur during carcinogenesis. Increased efforts to reveal these interactions have led to the development of a variety of tools to detect and understand these interactions. We have recently described a machine learning approach miRcorrNet, based on grouping and scoring (ranking) groups of genes, where each group is associated with a miRNA and the group members are genes with expression patterns that are correlated with this specific miRNA. The miRcorrNet tool requires two types of -omics data, miRNA and mRNA expression profiles, as an input file. In this study we describe miRModuleNet, which groups mRNA (genes) that are correlated with each miRNA to form a star shape, which we identify as a miRNA-mRNA regulatory module. A scoring procedure is then applied to each module to further assess their contribution in terms of classification. An important output of miRModuleNet is that it provides a hierarchical list of significant miRNA-mRNA regulatory modules. miRModuleNet was further validated on external datasets for their disease associations, and functional enrichment analysis was also performed. The application of miRModuleNet aids the identification of functional relationships between significant biomarkers and reveals essential pathways involved in cancer pathogenesis.Article Citation - WoS: 20Citation - Scopus: 24miRdisNET: Discovering MicroRNA Biomarkers That Are Associated With Diseases Utilizing Biological Knowledge-Based Machine Learning(Frontiers Media S.A., 2023-01-12) Jabeer, Amhar; Temiz, Mustafa; Bakir-Gungor, Burcu; Yousef, MalikDuring recent years, biological experiments and increasing evidence have shown that MicroRNAs play an important role in the diagnosis and treatment of human complex diseases. Therefore, to diagnose and treat human complex diseases, it is necessary to reveal the associations between a specific disease and related miRNAs. Although current computational models based on machine learning attempt to determine miRNA-disease associations, the accuracy of these models need to be improved, and candidate miRNA-disease relations need to be evaluated from a biological perspective. In this paper, we propose a computational model named miRdisNET to predict potential miRNA-disease associations. Specifically, miRdisNET requires two types of data, i.e., miRNA expression profiles and known disease-miRNA associations as input files. First, we generate subsets of specific diseases by applying the grouping component. These subsets contain miRNA expressions with class labels associated with each specific disease. Then, we assign an importance score to each group by using a machine learning method for classification. Finally, we apply a modeling component and obtain outputs. One of the most important outputs of miRdisNET is the performance of miRNA-disease prediction. Compared with the existing methods, miRdisNET obtained the highest AUC value of .9998. Another output of miRdisNET is a list of significant miRNAs for disease under study. The miRNAs identified by miRdisNET are validated via referring to the gold-standard databases which hold information on experimentally verified MicroRNA-disease associations. miRdisNET has been developed to predict candidate miRNAs for new diseases, where miRNA-disease relation is not yet known. In addition, miRdisNET presents candidate disease-disease associations based on shared miRNA knowledge. The miRdisNET tool and other supplementary files are publicly available at: .Article Citation - WoS: 6Citation - Scopus: 7The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease(IEEE Computer Soc, 2022-05-01) Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, BurcuBehcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.Article Citation - WoS: 10Citation - Scopus: 15Textnettopics Pro, a Topic Model-Based Text Classification for Short Text by Integration of Semantic and Document-Topic Distribution Information(Frontiers Media S.A., 2023-10-05) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, MalikWith the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category. Article titles are concise descriptions of the articles' content with valuable information that can be useful in document classification and categorization. However, shortness, data sparseness, limited word occurrences, and the inadequate contextual information of scientific document titles hinder the direct application of conventional text mining and machine learning algorithms on these short texts, making their classification a challenging task. This study firstly explores the performance of our earlier study, TextNetTopics on the short text. Secondly, here we propose an advanced version called TextNetTopics Pro, which is a novel short-text classification framework that utilizes a promising combination of lexical features organized in topics of words and topic distribution extracted by a topic model to alleviate the data-sparseness problem when classifying short texts. We evaluate our proposed approach using nine state-of-the-art short-text topic models on two publicly available datasets of scientific article titles as short-text documents. The first dataset is related to the Biomedical field, and the other one is related to Computer Science publications. Additionally, we comparatively evaluate the predictive performance of the models generated with and without using the abstracts. Finally, we demonstrate the robustness and effectiveness of the proposed approach in handling the imbalanced data, particularly in the classification of Drug-Induced Liver Injury articles as part of the CAMDA challenge. Taking advantage of the semantic information detected by topic models proved to be a reliable way to improve the overall performance of ML classifiers.Article Citation - WoS: 15Citation - Scopus: 15PriPath: Identifying Dysregulated Pathways From Differential Gene Expression via Grouping, Scoring, and Modeling With an Embedded Feature Selection Approach(BMC, 2023-02-23) Yousef, Malik; Ozdemir, Fatma; Jaber, Amhar; Allmer, Jens; Bakir-Gungor, BurcuBackgroundCell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases.ResultsPriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research.ConclusionsPriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.Article Citation - WoS: 5Citation - Scopus: 5Novel Antimicrobial Peptide Design Using Motif Match Score Representation(IEEE Computer Soc, 2024-11) Soylemez, Ummu Gulsum; Yousef, Malik; Kesmen, Zulal; Bakir-Gungor, BurcuAntimicrobial peptides (AMPs) have drawn the interest of the researchers since they offer an alternative to the traditional antibiotics in the fight against antibiotic resistance and they exhibit additional pharmaceutically significant properties. Recently, computational approaches attemp to reveal how antibacterial activity is determined from a machine learning perspective and they aim to search and find the biological cues or characteristics that control antimicrobial activity via incorporating motif match scores. This study is dedicated to the development of a machine learning framework aimed at devising novel antimicrobial peptide (AMP) sequences potentially effective against Gram-positive/Gram-negative bacteria. In order to design newly generated sequences classified as either AMP or non-AMP, various classification models were trained. These novel sequences underwent validation utilizing the "DBAASP: strain-specific antibacterial prediction based on machine learning approaches and data on AMP sequences" tool. The findings presented herein represent a significant stride in this computational research, streamlining the process of AMP creation or modification within wet lab environments.Article Citation - WoS: 16Citation - Scopus: 20Invention of 3Mint for Feature Grouping and Scoring in Multi-Omics(Frontiers Media S.A., 2023-03-15) Yazici, Miray Unlu; Marron, J. S.; Bakir-Gungor, Burcu; Zou, Fei; Yousef, Malik; Unlu Yazici, MirayAdvanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), MicroRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at .Article Citation - WoS: 16Citation - Scopus: 21GeNetOntology: Identifying Affected Gene Ontology Terms via Grouping, Scoring, and Modeling of Gene Expression Data Utilizing Biological Knowledge-Based Machine Learning(Frontiers Media S.A., 2023-08-21) Ersoz, Nur Sebnem; Bakir-Gungor, Burcu; Yousef, MalikIntroduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product.Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype.Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model.Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.Article Citation - WoS: 1Citation - Scopus: 3Engineering Novel Features for Diabetes Complication Prediction Using Synthetic Electronic Health Records(Frontiers Media S.A., 2025-04-14) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, MalikDiabetes significantly affects millions of people worldwide, leading to substantial morbidity, disability, and mortality rates. Predicting diabetes-related complications from health records is crucial for early prevention and for the development of effective treatment plans. In order to predict four different complications of diabetes mellitus, i.e., retinopathy, chronic kidney disease, ischemic heart disease, and amputations, this study introduces a novel feature engineering approach. While developing the classification models, we utilize XGBoost feature selection method and various supervised machine learning algorithms, including Random Forest, XGBoost, LogitBoost, AdaBoost, and Decision Tree. These models were trained on synthetic electronic health records (EHR) generated by dual-adversarial autoencoders. These EHRs represent nearly 1 million synthetic patients derived from an authentic cohort of 979,308 individuals with diabetes. The variables considered in the models were the age range accompanied by chronic diseases that occur during patient visits starting from the onset of diabetes. Throughout the experiments, XGBoost and Random Forest demonstrated the best overall prediction performance. The final models, which are tailored to each complication and trained using our feature engineering approach, achieved an accuracy between 69% and 77% and an AUC between 77% and 84% using cross-validation, while the partitioned validation approach yielded an accuracy between 59% and 78% and an AUC between 66% and 85%. These findings imply that the performance of our method surpass the performance of the traditional Bag-of-Features approach, highlighting the effectiveness of our approach in enhancing model accuracy and robustness.
