WoS İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/394
Browse
14 results
Search Results
Conference Object Enhancing Complex Disease Group Scoring with Mirgedinet: A Multi-Algorithm Machine Learning Framework Based on the GSM Approach(IEEE, 2025-06-25) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, MalikIntegrating biological prior knowledge for disease gene associations has shown significant promise in discovering new biomarkers with potential translational applications. This work investigates the application of a multi-algorithm machine learning framework based on the Grouping-Scoring-Modeling (G-S-M) approach for improving the prediction of complex diseases. The study identifies the primary gene and miRNA interactions in various complex diseases with the help of miRGediNET, which is a machine-learning based tool that integrates data from three biological databases. Traditional methods have only focused on independence between features; the G-S-M method focuses on aggregating genes based on biological interactions, pinpointing the scoring of gene groups for a disease, and modeling its predictive capability using advanced machine learning algorithms. In this research paper, seven algorithms, including Support Vector Machine, Decision Tree, and CatBoost, were applied to eight datasets extracted from the GEO database. This framework proved very robust in ranking gene clusters, thus predicting critical biomarkers while doing 100-fold randomized cross-validation within the evaluation. The results indicate this approach's high potential for refining disease and supporting research for choosing the best algorithm that can provide biological insights and computational advances.Conference Object Exploring Microbiome Signatures in Autism Spectrum Disorder via Grouping-Scoring Based Machine Learning(IEEE, 2025-06-25) Temiz, Mustafa; Ersoz, Nur Sebnem; Yousef, Malik; Bakir-Gungor, BurcuThe rapid increase in omic data production increased the importance of machine learning (ML) methods to analze these data. In particular, the use of metagenomic data in the diagnosis, prognosis and treatment of diseases is becoming widespread. Autism Spectrum Disorder (ASD) is a neurodevelopmental disease that occurs in early childhood and continues lifelong. The aim of this study is to increase ML performance, reduce computational costs and achieve successful classification performance using a small number of metagenomic features. In addition, disease prediction is performed; ASD associated biomarkers are determined using the microBiomeGSM on metagenomic data. Classification is performed at three different taxonomic levels (genus, family and order) using the relative abundance values of species. The best performance metric (0.95 AUC) was obtained at the order taxonomic level using an average of 416 features with microBiomeGSM. The identified ASD-related taxonomic species are presented.Conference Object Citation - WoS: 1Citation - Scopus: 1Textnettopics-SFTS-SBTS Textnettopics Scoring Approaches Based Sequential Forward and Backward(Springer International Publishing AG, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, MalikTextNetTopics is a text classification-based topic modeling approach that performs topic selection rather than word selection to train a machine learning algorithm. However, one main limitation of TextNetTopics is that its scoring component (the S component) assesses each topic independently and ranks them accordingly, neglecting the potential relationship between topics. In order to address this limitation and improve the classification performance, this study introduces an enhancement to TextNetTopics. TextNetTopics-SFTS-SBTS integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. This integration aims to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained across three datasets offer valuable insights into the context-dependent effectiveness of the new scoring mechanisms across diverse datasets and varying numbers of topics involved in the analysis.Conference Object TextNetTopics+: Enhancing Text Classification Through Classifier Diversity and Model Ensembling(Springer International Publishing AG, 2025) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, MalikTextNetTopics is an innovative text classification framework that integrates topic modeling with feature selection to improve model accuracy and interpretability. Unlike traditional methods that rely on individual words, TextNetTopics selects cohesive topics extracted via Latent Dirichlet Allocation as features for document representation, effectively reducing dimensionality while preserving the semantic structure of the text. This study evaluates the performance of TextNetTopics utilizing multiple machine learning algorithms in the M (Modeling) component, including Random Forest, Support Vector Machine, Gradient Boosting, eXtreme Gradient Boosting, and Logistic Regression. To further enhance classification performance, we introduce TextNetTopics+, an ensemblebased extension that leverages both hard voting and soft voting mechanisms to combine the strengths of multiple classifiers. Comprehensive experiments on the LitCovid and WOS datasets demonstrate that ensemble learning in TextNetTopics + significantly outperforms individual classifiers in TextNetTopics, confirming its effectiveness in improving model robustness and generalization.Conference Object Citation - Scopus: 1Semant - Feature Group Selection Utilizing Fasttext-Based Semantic Word Grouping, Scoring, and Modeling Approach for Text Classification(Springer International Publishing AG, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, MalikText classification presents a challenge due to its high-dimensional feature space. As such, devising an effective feature selection scheme is essential. In this study, we present SEMANT, a novel hybrid filter-wrapper feature selection method that utilizes filter-based Chi-Square and the wrapper-based G-S-M approach. SEMANT incorporates fastText neural word embedding similarities to promote greater semantic inclusion in the selection of features for text classification tasks. The performance of the proposed method was investigated on the WOS-5736 and LitCovid datasets and compared with TextNetTopics, a topic modeling-based topic selection algorithm for text classification. Experimental results confirm that the proposed approach outperforms its alternative.Conference Object Citation - WoS: 8Citation - Scopus: 12SVM-RCE-R Optimization of Scoring Function for SVM-RCE(Springer International Publishing AG, 2021) Yousef, Malik; Jabeer, Amhar; Bakir-Gungor, BurcuGene expression data classification provides a challenge in classification due to it having high dimensionality and a relatively small sample size. Different feature selection approaches have been used to overcome this issue and SVM-RCE being one of the more successful approach. This study is a continuation of two previous research studies SVM-RCE and SVM-RCE-R. SVM-RCE-R suggests a new approach in the scoring function for the clusters, showing that for some different combination of weights the performance was improved. The aim of this study is to find the optimal weights for the scoring function suggested in the study of SVM-RCE-R using optimization approaches. We have discovered that finding the optimal weights for the scoring function would improve the performance of the SVM-RCE-in most cases. We have shown that in some cases the performance is increased dramatically by 10% in terms of accuracy and AUC. By increasing the performance of the algorithm, it is more likely that we can extract subset genes relating to the class association of a microarray sample.Conference Object Citation - WoS: 1Citation - Scopus: 1Prediction of Type 2 Diabetes Using Metagenomic Data and Identification of Taxonomic Biomarkers(IEEE, 2024-05-15) Temiz, Mustafa; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, BurcuNowadays, different molecular levels of -omics data on diseases are generated and analyzing these data with machine learning methods is one of the popular research topics. Among these data, the use of metagenomic data to facilitate the diagnosis, detection and treatment of diseases is increasing day by day. Type 2 diabetes (T2D) is a chronic disease characterized by insulin resistance and progressive dysfunction of pancreatic beta cells. While the number of people with diabetes is increasing by around 8% annually, the cost of treating the disease is rising by 18% per year. Therefore, the number of studies on the diagnosis, development and progression of T2D is increasing over time. The aim of this study is to achieve higher machine learning performance by using fewer metagenomic features and to achieve better classification performance by reducing computational costs. In this study, we compare the performance of three different methods using T2D-related metagenomic data. First, the MetaPhlAn tool is used to calculate the taxonomic species and their relative abundances in each sample. The SVM-RCE, RCE-IFE and microBiomeGSM tools used in this study are methods that perform classification by grouping and scoring features and are known to work well on complex datasets. In this study, the best results were obtained with the RCE-IFE tool with an AUC of 0.72 with an average of 125 features information. In addition, key taxonomic species identified by these tools as associated with T2D are presented in comparison to the literature.Article Citation - WoS: 5Citation - Scopus: 5Novel Antimicrobial Peptide Design Using Motif Match Score Representation(IEEE Computer Soc, 2024-11) Soylemez, Ummu Gulsum; Yousef, Malik; Kesmen, Zulal; Bakir-Gungor, BurcuAntimicrobial peptides (AMPs) have drawn the interest of the researchers since they offer an alternative to the traditional antibiotics in the fight against antibiotic resistance and they exhibit additional pharmaceutically significant properties. Recently, computational approaches attemp to reveal how antibacterial activity is determined from a machine learning perspective and they aim to search and find the biological cues or characteristics that control antimicrobial activity via incorporating motif match scores. This study is dedicated to the development of a machine learning framework aimed at devising novel antimicrobial peptide (AMP) sequences potentially effective against Gram-positive/Gram-negative bacteria. In order to design newly generated sequences classified as either AMP or non-AMP, various classification models were trained. These novel sequences underwent validation utilizing the "DBAASP: strain-specific antibacterial prediction based on machine learning approaches and data on AMP sequences" tool. The findings presented herein represent a significant stride in this computational research, streamlining the process of AMP creation or modification within wet lab environments.Conference Object Metagenomic Data Analysis With Machine Learning to Discover Colorectal Cancer-Associated Enzymes(IEEE, 2024-05-15) Ersoz, Nur Sebnem; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, BurcuThe human gut microbiome comprises over 10 trillion microbes and plays important roles in maintaining metabolism, body homeostasis, impacting immune function. Metagenomics which studies genomic data from clinical and environmental samples is crucial in understanding the interplay between the host and the gut microbiome. Recently, functional profiling of metagenomes helps to identify alterations in microbial functions, particularly enzyme-encoding genes. Colorectal cancer (CRC) is known as one of the leading causes of cancer-related deaths. In this study, we aimed to find the CRC-associated enzymes by analyzing metagenomic data with different machine learning methods. A total of 1262 samples including CRC and control groups from different countries were used in this study. This dataset was obtained by functionally profiling metagenomics data and estimating community level enzyme commission (EC) abundance values. For the analysis of this dataset, RCE-IFE and SVM-RCE machine learning methods, which are group-based feature selection methods, were compared with 6 different individual feature selection methods. 10 times Monte-Carlo Cross Validation was used in our experiments. It was observed that RCE-IFE, Extreme Gradient Boosting and Select K Best methods similarly provided the best performances. Especially in this study, besides the its high performance, the group-based feature selection method RCE-IFE grouped enzymes into clusters unlike TFS, and then identified biologically relevant CRC-associated enzymes.Conference Object Leveraging MicroRNA-Gene Associations With Mirgedinet: An Intelligent Approach for Enhanced Classification of Breast Cancer Molecular Subtypes(Springer International Publishing AG, 2025) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, MalikUnderstanding the molecular subtypes of breast cancer is crucial for advancing targeted therapies and precision medicine. For the BRCA molecular subtype prediction problem, this study employs miRGediNET, a machinelearning approach that integrates data from miRTarBase, DisGeNET, and HMDD databases to investigate shared gene associations between microRNA (miRNA) activity and disease mechanisms. Using the BRCA LumAB_Her2Basal dataset, we evaluate miRGediNET's performance against traditional feature selection methods, including CMIM, mRmR, Information Gain (IG), SelectKBest (SKB), Fast Correlation-Based Filter (FCBF), and XGBoost (XGB). These feature selection techniques were assessed using various classification algorithms including Random Forest (RF), Support Vector Machine (SVM), LogitBoost, Decision Tree, and AdaBoost, all executed with default parameters. The feature selection methods were tested using Monte Carlo Cross-Validation, where performance metrics obtained for each iteration were averaged to ensure robustness. Our findings reveal that miRGediNET outperforms traditional methods in accuracy and Area Under the Curve (AUC), emphasizing its superior capability to identify key genes that bridge miRNA interactions and breast cancer mechanisms. Notably, both miRGediNET and Information Gain (IG) feature selection consistently identified ESR1, a critical biomarker frequently reported in recent research associated with breast cancer prognosis and resistance to endocrine therapies. This integrative approach provides deeper biological insights into miRNA-disease interactions, paving the way for enhanced patient stratification, biomarker discovery, and personalized medicine strategies. The miRGediNET tool, developed on the KNIME platform, offers a practical resource for further exploration in the field of bioinformatics and oncology.
