Scopus İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395
Browse
6 results
Search Results
Conference Object TextNetTopics+: Enhancing Text Classification Through Classifier Diversity and Model Ensembling(Springer International Publishing AG, 2025) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, MalikTextNetTopics is an innovative text classification framework that integrates topic modeling with feature selection to improve model accuracy and interpretability. Unlike traditional methods that rely on individual words, TextNetTopics selects cohesive topics extracted via Latent Dirichlet Allocation as features for document representation, effectively reducing dimensionality while preserving the semantic structure of the text. This study evaluates the performance of TextNetTopics utilizing multiple machine learning algorithms in the M (Modeling) component, including Random Forest, Support Vector Machine, Gradient Boosting, eXtreme Gradient Boosting, and Logistic Regression. To further enhance classification performance, we introduce TextNetTopics+, an ensemblebased extension that leverages both hard voting and soft voting mechanisms to combine the strengths of multiple classifiers. Comprehensive experiments on the LitCovid and WOS datasets demonstrate that ensemble learning in TextNetTopics + significantly outperforms individual classifiers in TextNetTopics, confirming its effectiveness in improving model robustness and generalization.Conference Object Citation - WoS: 8Citation - Scopus: 12SVM-RCE-R Optimization of Scoring Function for SVM-RCE(Springer International Publishing AG, 2021) Yousef, Malik; Jabeer, Amhar; Bakir-Gungor, BurcuGene expression data classification provides a challenge in classification due to it having high dimensionality and a relatively small sample size. Different feature selection approaches have been used to overcome this issue and SVM-RCE being one of the more successful approach. This study is a continuation of two previous research studies SVM-RCE and SVM-RCE-R. SVM-RCE-R suggests a new approach in the scoring function for the clusters, showing that for some different combination of weights the performance was improved. The aim of this study is to find the optimal weights for the scoring function suggested in the study of SVM-RCE-R using optimization approaches. We have discovered that finding the optimal weights for the scoring function would improve the performance of the SVM-RCE-in most cases. We have shown that in some cases the performance is increased dramatically by 10% in terms of accuracy and AUC. By increasing the performance of the algorithm, it is more likely that we can extract subset genes relating to the class association of a microarray sample.Conference Object Leveraging MicroRNA-Gene Associations With Mirgedinet: An Intelligent Approach for Enhanced Classification of Breast Cancer Molecular Subtypes(Springer International Publishing AG, 2025) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, MalikUnderstanding the molecular subtypes of breast cancer is crucial for advancing targeted therapies and precision medicine. For the BRCA molecular subtype prediction problem, this study employs miRGediNET, a machinelearning approach that integrates data from miRTarBase, DisGeNET, and HMDD databases to investigate shared gene associations between microRNA (miRNA) activity and disease mechanisms. Using the BRCA LumAB_Her2Basal dataset, we evaluate miRGediNET's performance against traditional feature selection methods, including CMIM, mRmR, Information Gain (IG), SelectKBest (SKB), Fast Correlation-Based Filter (FCBF), and XGBoost (XGB). These feature selection techniques were assessed using various classification algorithms including Random Forest (RF), Support Vector Machine (SVM), LogitBoost, Decision Tree, and AdaBoost, all executed with default parameters. The feature selection methods were tested using Monte Carlo Cross-Validation, where performance metrics obtained for each iteration were averaged to ensure robustness. Our findings reveal that miRGediNET outperforms traditional methods in accuracy and Area Under the Curve (AUC), emphasizing its superior capability to identify key genes that bridge miRNA interactions and breast cancer mechanisms. Notably, both miRGediNET and Information Gain (IG) feature selection consistently identified ESR1, a critical biomarker frequently reported in recent research associated with breast cancer prognosis and resistance to endocrine therapies. This integrative approach provides deeper biological insights into miRNA-disease interactions, paving the way for enhanced patient stratification, biomarker discovery, and personalized medicine strategies. The miRGediNET tool, developed on the KNIME platform, offers a practical resource for further exploration in the field of bioinformatics and oncology.Conference Object Citation - WoS: 12Citation - Scopus: 17Integrating Gene Ontology Based Grouping and Ranking Into the Machine Learning Algorithm for Gene Expression Data Analysis(Springer International Publishing AG, 2021) Yousef, Malik; Sayici, Ahmet; Bakir-Gungor, BurcuRecent advances in the high throughput technologies resulted in the production of large gene expression data sets for several phenotypes. Via comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc., one could identify biomarkers. As opposed to traditional gene selection approaches, integrative gene selection approaches incorporate domain knowledge from external biological resources during gene selection, which improves interpretability and predictive performance. In this respect, Gene Ontology provides cellular component, molecular function and biological process terms for the products of each gene. In this study, we present Gene Ontology based feature selection approach for gene expression data analysis. In our approach, we used the ontology information as grouping (term) information and embedded this information into a machine learning algorithm for selecting the most significant groups (terms) of ontology. Those groups are used to build the machine learning model in order to perform the classification task. The output of the tool is a significant ontology group for the task of 2-class classification applied on the gene expression data. This knowledge allows the researcher to perform more advanced gene expression analyses. We tested our approach on 8 different gene expression datasets. In our experiments, we observed that the tool successfully found the significant Ontology terms that would be used as a classification model. We believe that our tool will help the geneticists to identify affected genes in transcriptomic data and this information could enable the design of platforms to assist diagnosis, to assess patients' prognoses, and to create patient treatment plans.Article Enlightening the Molecular Mechanisms of Type 2 Diabetes With a Novel Pathway Clustering and Pathway Subnetwork Approach(Tubitak Scientific & Technological Research Council Turkey, 2022-01-01) Bakir-Gungor, Burcu; Yazici, Miray Unlu; Goy, Gokhan; Temiz, Mustafa; Ünlü Yazici, MirayType 2 diabetes mellitus (T2D) constitutes 90% of the diabetes cases, and it is a complex multifactorial disease. In the last decade, genome-wide association studies (GWASs) for T2D successfully pinpointed the genetic variants (typically single nucleotide polymorphisms, SNPs) that associate with disease risk. In order to diminish the burden of multiple testing in GWAS, researchers attempted to evaluate the collective effects of interesting variants. In this regard, pathway-based analyses of GWAS became popular to discover novel multigenic functional associations. Still, to reveal the unaccounted 85 to 90% of T2D variation, which lies hidden in GWAS datasets, new post-GWAS strategies need to be developed. In this respect, here we reanalyze three metaanalysis data of GWAS in T2D, using the methodology that we have developed to identify disease-associated pathways by combining nominally significant evidence of genetic association with the known biochemical pathways, protein-protein interaction (PPI) networks, and the functional information of selected SNPs. In this research effort, to enlighten the molecular mechanisms underlying T2D development and progress, we integrated different in silico approaches that proceed in top-down manner and bottom-up manner, and presented a comprehensive analysis at protein subnetwork, pathway, and pathway subnetwork levels. Using the mutual information based on the shared genes, the identified protein subnetworks and the affected pathways of each dataset were compared. While most of the identified pathways recapitulate the pathophysiology of T2D, our results show that incorporating SNP functional properties, PPI networks into GWAS can dissect leading molecular pathways, and it could offer improvement over traditional enrichment strategies.Conference Object Colorectal Cancer Prediction via Applying Recursive Cluster Elimination With Intra-Cluster Feature Elimination on Metagenomic Pathway Data(Springer International Publishing AG, 2024) Temiz, Mustafa; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, BurcuAdvances in next-generation sequencing and in "-omics" technologies enable the characterization of the human gut microbiome. Colorectal cancer (CRC), the third most common cancer worldwide, is caused by genetic mutations, environmental influences, and abnormalities in the gut microbiota. The aim of this study is to identify pathways that influence host metabolism in CRC patients. The CRC-related metagenomic dataset used in this study contains the relative abundance values of 551 pathways calculated for 1262 samples. Here, two different approaches based on the feature grouping reduce the number of features by considering relevant features as groups, eliminate irrelevant features, and perform classification. The recursive cluster elimination with intra-cluster feature elimination (RCE-IFE) approach achieves anAUCof 0.72 using an average of 66.2 features on CRC-associated metagenomics dataset. In these experiments, P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: S-adenosyl-L-methionine cycle I pathways are identified as potential biomarkers associated with CRC. These experiments also reduce the number of features reported by both approaches in P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: Sadenosyl-L-methionine cycle I pathways reported by both approaches are considered possible CRC-related biomarkers. This study contributes to the molecular diagnosis and treatment of colorectal cancer by revealing the pathways associated with CRC. Our results are promising for the study of the gut microbiota and its role in CRC.
