Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 26

Impact of Gene Duplicate Handling Strategies on Classification Performance and Feature Selection in Gene Expression Data
(Institute of Electrical and Electronics Engineers Inc., 2025-09-17) Kuzudisli, Cihan; Qaqish, Bahjat; Gungor, Burcu Bakir; Yousef, Malik
Enhancing Complex Disease Group Scoring with Mirgedinet: A Multi-Algorithm Machine Learning Framework Based on the GSM Approach
(IEEE, 2025-06-25) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik
Integrating biological prior knowledge for disease gene associations has shown significant promise in discovering new biomarkers with potential translational applications. This work investigates the application of a multi-algorithm machine learning framework based on the Grouping-Scoring-Modeling (G-S-M) approach for improving the prediction of complex diseases. The study identifies the primary gene and miRNA interactions in various complex diseases with the help of miRGediNET, which is a machine-learning based tool that integrates data from three biological databases. Traditional methods have only focused on independence between features; the G-S-M method focuses on aggregating genes based on biological interactions, pinpointing the scoring of gene groups for a disease, and modeling its predictive capability using advanced machine learning algorithms. In this research paper, seven algorithms, including Support Vector Machine, Decision Tree, and CatBoost, were applied to eight datasets extracted from the GEO database. This framework proved very robust in ranking gene clusters, thus predicting critical biomarkers while doing 100-fold randomized cross-validation within the evaluation. The results indicate this approach's high potential for refining disease and supporting research for choosing the best algorithm that can provide biological insights and computational advances.
Exploring Microbiome Signatures in Autism Spectrum Disorder via Grouping-Scoring Based Machine Learning
(IEEE, 2025-06-25) Temiz, Mustafa; Ersoz, Nur Sebnem; Yousef, Malik; Bakir-Gungor, Burcu
The rapid increase in omic data production increased the importance of machine learning (ML) methods to analze these data. In particular, the use of metagenomic data in the diagnosis, prognosis and treatment of diseases is becoming widespread. Autism Spectrum Disorder (ASD) is a neurodevelopmental disease that occurs in early childhood and continues lifelong. The aim of this study is to increase ML performance, reduce computational costs and achieve successful classification performance using a small number of metagenomic features. In addition, disease prediction is performed; ASD associated biomarkers are determined using the microBiomeGSM on metagenomic data. Classification is performed at three different taxonomic levels (genus, family and order) using the relative abundance values of species. The best performance metric (0.95 AUC) was obtained at the order taxonomic level using an average of 416 features with microBiomeGSM. The identified ASD-related taxonomic species are presented.
Citation - Scopus: 2
miRcorrNetPro: Unraveling Algorithmic Insights Through Cross-Validation in Multi-Omics Integration for Comprehensive Data Analysis
(Institute of Electrical and Electronics Engineers Inc., 2023-12-05) Ünlü Yazici, Miray; Yousef, Malik; Marron, J. S.; Bakir-Güngör, Burcu; Yazici, Miray Unlu
High throughput -omics technologies facilitate the investigation of regulatory mechanisms of complex diseases. Along this line, scientists develop promising tools and methods to extend our understanding at the molecular and functional levels. To this end, miRcorrNet tool performs integrative analysis of MicroRNA (miRNA) and gene expression profiles via machine learning (ML) approach to identify significant miRNA groups and their associated target genes. In this study, we propose miRcorrNetPro tool, which extends miRcorrNet by tracking group scoring, ranking and other information through the cross-validation iterations. Heatmap visualizations enable deep novel insights into the collective behavior of clusters of groups in cellular signaling and hence facilitate detection of potential biomarkers for the disease under investigation. Although miRcorrNetPro is designed as a generic tool, here we present our findings and potential miRNA biomarkers for Breast Cancer (BRCA). The miRcorrNetPro tool and all other supplementary files are available at https://github.com/Miray-Unlu/miRcorrNetPro. © 2024 Elsevier B.V., All rights reserved.
The Effect of Different Classifiers on Recursive Cluster Elimination in the Analysis of Transcriptomic Data
(Institute of Electrical and Electronics Engineers Inc., 2023-10-11) Bulut, Nurten; Bakir-Güngör, Burcu; Qaqish, Bahjat F.; Yousef, Malik
Gene expression data with limited sample size and a large number of genes are frequently encountered in genetic studies. In such high-dimensional data, identification of genes that distinguish between disease states is a challenging task. Feature selection (FS) is a useful approach in dealing with high dimensionality. Support Vector Machines Recursive Cluster Elimination (SVM-RCE) is a technique for FS in high-dimensional data. The SVM-RCE approach has been utilized for identification of clusters of genes whose expression levels correlate with pathological state. A key step in SVM-RCE is the use of an SVM classifier to assign an area under the curve (AUC) score to each gene cluster based on its ability to predict class labels. In this study, we investigate the use of alternative classifiers in the cluster-scoring step. Specifically, we compare Support Vector Machines, Random Forest, XgBoost, Naive Bayes, and linear logistic regression. In addition to AUC score performance evaluation, the algorithms are compared in terms of the number of selected genes at different levels of clustering and in terms of the running time. © 2023 Elsevier B.V., All rights reserved.
Citation - WoS: 1
Citation - Scopus: 1
Textnettopics-SFTS-SBTS Textnettopics Scoring Approaches Based Sequential Forward and Backward
(Springer International Publishing AG, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
TextNetTopics is a text classification-based topic modeling approach that performs topic selection rather than word selection to train a machine learning algorithm. However, one main limitation of TextNetTopics is that its scoring component (the S component) assesses each topic independently and ranks them accordingly, neglecting the potential relationship between topics. In order to address this limitation and improve the classification performance, this study introduces an enhancement to TextNetTopics. TextNetTopics-SFTS-SBTS integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. This integration aims to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained across three datasets offer valuable insights into the context-dependent effectiveness of the new scoring mechanisms across diverse datasets and varying numbers of topics involved in the analysis.
Citation - Scopus: 1
TextNetTopics_TIS: Enhancing Textnettopics With Random Forest-Based Topic Importance Scoring
(Institute of Electrical and Electronics Engineers Inc., 2024-10-16) Voskergian, Daniel; Bakir-Güngör, Burcu; Yousef, Malik
TextNetTopics is an innovative Latent Dirichlet Allocation-based topic selection method for training text classification models. One main limitation is its computationally intensive scoring mechanism, especially when applied to many topics. This scoring mechanism involves training a machine learning model (i.e., Random Forest) on each topic using the Monte-Carlo Cross-Validation approach and assigning a score value based on a specific performance metric (e.g., accuracy or F1-score). Moreover, the measured score does not account for the interactions between all features residing in all topics. This paper presents a new topic-scoring mechanism called Topic Importance Scoring. This computationally efficient approach trains a Random Forest model on all topics simultaneously and leverages the extracted feature importance values to give each topic a score reflecting its classification potential. The experiments on three diverse datasets confirm that the proposed method's performance is superior to the Topic Performance Scoring, which was used in the original TextNetTopics method. © 2024 Elsevier B.V., All rights reserved.
TextNetTopics+: Enhancing Text Classification Through Classifier Diversity and Model Ensembling
(Springer International Publishing AG, 2025) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
TextNetTopics is an innovative text classification framework that integrates topic modeling with feature selection to improve model accuracy and interpretability. Unlike traditional methods that rely on individual words, TextNetTopics selects cohesive topics extracted via Latent Dirichlet Allocation as features for document representation, effectively reducing dimensionality while preserving the semantic structure of the text. This study evaluates the performance of TextNetTopics utilizing multiple machine learning algorithms in the M (Modeling) component, including Random Forest, Support Vector Machine, Gradient Boosting, eXtreme Gradient Boosting, and Logistic Regression. To further enhance classification performance, we introduce TextNetTopics+, an ensemblebased extension that leverages both hard voting and soft voting mechanisms to combine the strengths of multiple classifiers. Comprehensive experiments on the LitCovid and WOS datasets demonstrate that ensemble learning in TextNetTopics + significantly outperforms individual classifiers in TextNetTopics, confirming its effectiveness in improving model robustness and generalization.
Citation - Scopus: 1
Semant - Feature Group Selection Utilizing Fasttext-Based Semantic Word Grouping, Scoring, and Modeling Approach for Text Classification
(Springer International Publishing AG, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
Text classification presents a challenge due to its high-dimensional feature space. As such, devising an effective feature selection scheme is essential. In this study, we present SEMANT, a novel hybrid filter-wrapper feature selection method that utilizes filter-based Chi-Square and the wrapper-based G-S-M approach. SEMANT incorporates fastText neural word embedding similarities to promote greater semantic inclusion in the selection of features for text classification tasks. The performance of the proposed method was investigated on the WOS-5736 and LitCovid datasets and compared with TextNetTopics, a topic modeling-based topic selection algorithm for text classification. Experimental results confirm that the proposed approach outperforms its alternative.
Citation - WoS: 8
Citation - Scopus: 12
SVM-RCE-R Optimization of Scoring Function for SVM-RCE
(Springer International Publishing AG, 2021) Yousef, Malik; Jabeer, Amhar; Bakir-Gungor, Burcu
Gene expression data classification provides a challenge in classification due to it having high dimensionality and a relatively small sample size. Different feature selection approaches have been used to overcome this issue and SVM-RCE being one of the more successful approach. This study is a continuation of two previous research studies SVM-RCE and SVM-RCE-R. SVM-RCE-R suggests a new approach in the scoring function for the clusters, showing that for some different combination of weights the performance was improved. The aim of this study is to find the optimal weights for the scoring function suggested in the study of SVM-RCE-R using optimization approaches. We have discovered that finding the optimal weights for the scoring function would improve the performance of the SVM-RCE-in most cases. We have shown that in some cases the performance is increased dramatically by 10% in terms of accuracy and AUC. By increasing the performance of the algorithm, it is more likely that we can extract subset genes relating to the class association of a microarray sample.

Scopus İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results