Moleküler Biyoloji ve Genetik Bölümü Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/209

Browse

Now showing 1 - 5 of 5

Classification of Breast Cancer Molecular Subtypes with Grouping-Scoring-Modeling Approach that Incorporates Disease-Disease Association Information
(IEEE Xplore, 2024) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu
This study uses modern sequencing technology and large biological databases to investigate the molecular intricacies of complicated diseases like cancer. Using gene expression databases and biomarkers, the research aims to improve breast cancer molecular subtype identification for better patient outcomes. Using BRCA LumAB_ Her2Basal dataset, this study compares an integrative machine learning-based strategy (GediNET) to traditional feature selection approaches across machine learning classifiers. GediNET excels at uncovering crucial disease-disease connections and potential biomarkers using the Grouping-Scoring-Modeling (GSM) approach, which favors gene groupings above individual genes. Our comparative analysis highlights GediNET's exceptional performance, notably in terms of accuracy and Area Under the Curve metrics, underscoring its effectiveness in uncovering the genetic intricacies of breast cancer. GediNET's promise to improve disease classification and biomarker identification by improving biological mechanism understanding goes beyond exceeding traditional approaches. The work shows that GediNET's integrative method can promote bioinformatics research by identifying the most informative genes associated with certain diseases, enabling focused and customized medicine.
eTNT: Enhanced TextNetTopics with Filtered LDA Topics and Sequential Forward / Backward Topic Scoring Approaches
(SCIENCE & INFORMATION-SAI ORGANIZATION LTD, 2024) Voskergian, Daniel; Jayousi, Rashid; Bakir-Gungor, Burcu; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu
TextNetTopics is a novel text classification-based topic modelling approach that focuses on topic selection rather than individual word selection to train a machine learning algorithm. However, one key limitation of TextNetTopics is its scoring component, which evaluates each topic in isolation and ranks them accordingly, ignoring the potential relationships between topics. In addition, the chosen topics may contain redundant or irrelevant features, potentially increasing the feature set size and introducing noise that can degrade the overall model performance. To address these limitations and improve the classification performance, this study introduces an enhancement to TextNetTopics. eTNT integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. Moreover, it incorporates a filtering component that aims to enhance topics' quality and discriminative power by removing non-informative features from each topic using Random Forest feature importance values. These integrations aim to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained from the WOS-5736, LitCovid, and MultiLabel datasets provide valuable insights into the superior effectiveness of eTNT compared to its counterpart, TextNetTopics.
GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
(FRONTIERS MEDIA SA, 2023) Ersoz, Nur Sebnem; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0003-3343-9936; 0000-0002-2272-6270; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Moleküler Biyoloji ve Genetik Bölümü; Ersoz, Nur Sebnem; Bakir-Gungor, Burcu
Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product.Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype.Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model.Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.
Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data
(MDPI, 2025) Bakir-Gungor, Burcu; Ersoz, Nur Sebnem; Yousef, Malik; 0000-0002-2272-6270; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Moleküler Biyoloji ve Genetik Bölümü; Bakir-Gungor, Burcu; Ersoz, Nur Sebnem
Advances in metagenomics have revolutionized our ability to elucidate links between the microbiome and human diseases. Colorectal cancer (CRC), a leading cause of cancer-related mortality worldwide, has been associated with dysbiosis of the gut microbiome. This study aims to develop a method for identifying CRC-associated microbial enzymes by incorporating biological domain knowledge into the feature selection process. Conventional feature selection techniques often evaluate features individually and fail to leverage biological knowledge during metagenomic data analysis. To address this gap, we propose the enzyme commission (EC)-nomenclature-based Grouping-Scoring-Modeling (G-S-M) method, which integrates biological domain knowledge into feature grouping and selection. The proposed method was tested on a CRC-associated metagenomic dataset collected from eight different countries. Community-level relative abundance values of enzymes were considered as features and grouped based on their EC categories to provide biologically informed groupings. Our findings in randomized 10-fold cross-validation experiments imply that glycosidases, CoA-transferases, hydro-lyases, oligo-1,6-glucosidase, crotonobetainyl-CoA hydratase, and citrate CoA-transferase enzymes can be associated with CRC development as part of different molecular pathways. These enzymes are mostly synthesized by Eschericia coli, Salmonella enterica, Klebsiella pneumoniae, Staphylococcus aureus, Streptococcus pneumoniae, and Clostridioides dificile. Comparative evaluation experiments showed that the proposed model consistently outperforms traditional feature selection methods paired with various classifiers.
TextNetTopics-SFTS-SBTS: TextNetTopics Scoring Approaches Based Sequential Forward and Backward
(Springer Science and Business Media Deutschland GmbH, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik; 0009-0005-7544-9210; 0000-0002-2272-6270; 0000-0001-8780-6303; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Moleküler Biyoloji ve Genetik Bölümü; Bakir-Gungor, Burcu
TextNetTopics is a text classification-based topic modeling approach that performs topic selection rather than word selection to train a machine learning algorithm. However, one main limitation of TextNetTopics is that its scoring component (the S component) assesses each topic independently and ranks them accordingly, neglecting the potential relationship between topics. In order to address this limitation and improve the classification performance, this study introduces an enhancement to TextNetTopics. TextNetTopics-SFTS-SBTS integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. This integration aims to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained across three datasets offer valuable insights into the context-dependent effectiveness of the new scoring mechanisms across diverse datasets and varying numbers of topics involved in the analysis.

Browse

Browsing Moleküler Biyoloji ve Genetik Bölümü Koleksiyonu by Author "0000-0002-2272-6270"