Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 11

Developing a Label Propagation Approach for Cancer Subtype Classification Problem
(TUBITAK, 2021) Güner, P.; Bakir-Güngör, B.; Coşkun, M.; Şahan, Pınar Güner
Cancer is a disease in which abnormal cells grow uncontrollably and invade other tissues. Several types of cancer have various subtypes with different clinical and biological implications. Based on these differences, treatment methods need to be customized. The identification of distinct cancer subtypes is an important problem in bioinformatics, since it can guide future precision medicine applications. In order to design targeted treatments, bioinformatics methods attempt to discover common molecular pathology of different cancer subtypes. Along this line, several computational methods have been proposed to discover cancer subtypes or to stratify cancer into informative subtypes. However, existing works do not consider the sparseness of data (genes having low degrees) and result in an ill-conditioned solution. To address this shortcoming, in this paper, we propose an alternative unsupervised method to stratify cancer patients into subtypes using applied numerical algebra techniques. More specifically, we applied a label propagation-based approach to stratify somatic mutation profiles of colon, head and neck, uterine, bladder, and breast tumors. We evaluated the performance of our method by comparing it to the baseline methods. Extensive experiments demonstrate that our approach highly renders tumor classification tasks by largely outperforming the state-of-the-art unsupervised and supervised approaches. © 2022 Elsevier B.V., All rights reserved.
TextNetTopics+: Enhancing Text Classification Through Classifier Diversity and Model Ensembling
(Springer International Publishing AG, 2025) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
TextNetTopics is an innovative text classification framework that integrates topic modeling with feature selection to improve model accuracy and interpretability. Unlike traditional methods that rely on individual words, TextNetTopics selects cohesive topics extracted via Latent Dirichlet Allocation as features for document representation, effectively reducing dimensionality while preserving the semantic structure of the text. This study evaluates the performance of TextNetTopics utilizing multiple machine learning algorithms in the M (Modeling) component, including Random Forest, Support Vector Machine, Gradient Boosting, eXtreme Gradient Boosting, and Logistic Regression. To further enhance classification performance, we introduce TextNetTopics+, an ensemblebased extension that leverages both hard voting and soft voting mechanisms to combine the strengths of multiple classifiers. Comprehensive experiments on the LitCovid and WOS datasets demonstrate that ensemble learning in TextNetTopics + significantly outperforms individual classifiers in TextNetTopics, confirming its effectiveness in improving model robustness and generalization.
Citation - WoS: 8
Citation - Scopus: 12
SVM-RCE-R Optimization of Scoring Function for SVM-RCE
(Springer International Publishing AG, 2021) Yousef, Malik; Jabeer, Amhar; Bakir-Gungor, Burcu
Gene expression data classification provides a challenge in classification due to it having high dimensionality and a relatively small sample size. Different feature selection approaches have been used to overcome this issue and SVM-RCE being one of the more successful approach. This study is a continuation of two previous research studies SVM-RCE and SVM-RCE-R. SVM-RCE-R suggests a new approach in the scoring function for the clusters, showing that for some different combination of weights the performance was improved. The aim of this study is to find the optimal weights for the scoring function suggested in the study of SVM-RCE-R using optimization approaches. We have discovered that finding the optimal weights for the scoring function would improve the performance of the SVM-RCE-in most cases. We have shown that in some cases the performance is increased dramatically by 10% in terms of accuracy and AUC. By increasing the performance of the algorithm, it is more likely that we can extract subset genes relating to the class association of a microarray sample.
Citation - Scopus: 3
ROSE: A Novel Approach for Protein Secondary Structure Prediction
(Springer Science and Business Media Deutschland GmbH, 2021) Görmez, Yasin; Aydin, Zafer
Three-dimensional structure of protein gives important information about protein’s function. Since it is time-consuming and costly to find the structure of protein by experimental methods, estimation of three-dimensional structures of proteins through computational methods has been an efficient alternative. One of the most important steps for the 3-D protein structure prediction is protein secondary structure prediction. Proteins which contain different number and sequences of amino acids may have similar structures. Thus, extracting appropriate input features has crucial importance for secondary structure prediction. In this study, a novel model, ROSE, is proposed for secondary structure prediction that obtains probability distributions as a feature vector by using two position specific scoring matrices obtained by PSIBLAST and HHblits. ROSE is a two-stage hybrid classifier that uses a one-dimensional bi-directional recurrent neural network at the first stage and a support vector machine at the second stage. It is also combined with DSPRED method, which employs dynamic Bayesian networks and a support vector machine. ROSE obtained comparable results to DSPRED in cross-validation experiments performed on a difficult benchmark and can be used as an alternative to protein secondary structure prediction. © 2021 Elsevier B.V., All rights reserved.
Citation - WoS: 3
Citation - Scopus: 3
MicroRNA Prediction Based on 3D Graphical Representation of RNA Secondary Structures
(Tubitak Scientific & Technological Research Council Turkey, 2019-08-05) Sacar Demirci, Muserref Duygu; Demirci, Müşerref Duygu Saçar
MicroRNAs (miRNAs) are posttranscriptional regulators of gene expression. While a miRNA can target hundreds of messenger RNA (mRNAs), an mRNA can be targeted by different miRNAs, not to mention that a single miRNA might have various binding sites in an mRNA sequence. Therefore, it is quite involved to investigate miRNAs experimentally. Thus, machine learning (ML) is frequently used to overcome such challenges. The key parts of a ML analysis largely depend on the quality of input data and the capacity of the features describing the data. Previously, more than 1000 features were suggested for miRNAs. Here, it is shown that using 36 features representing the RNA secondary structure and its dynamic 3D graphical representation provides up to 98% accuracy values. In this study, a new approach for ML-based miRNA prediction is proposed. Thousands of models are generated through classification of known human miRNAs and pseudohairpins with 3 classifiers: decision tree, naive Bayes, and random forest. Although the method is based on human data, the best model was able to correctly assign 96% of nonhuman hairpins from MirGeneDB, suggesting that this approach might be useful for the analysis of miRNAs from other species.
Leveraging MicroRNA-Gene Associations With Mirgedinet: An Intelligent Approach for Enhanced Classification of Breast Cancer Molecular Subtypes
(Springer International Publishing AG, 2025) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik
Understanding the molecular subtypes of breast cancer is crucial for advancing targeted therapies and precision medicine. For the BRCA molecular subtype prediction problem, this study employs miRGediNET, a machinelearning approach that integrates data from miRTarBase, DisGeNET, and HMDD databases to investigate shared gene associations between microRNA (miRNA) activity and disease mechanisms. Using the BRCA LumAB_Her2Basal dataset, we evaluate miRGediNET's performance against traditional feature selection methods, including CMIM, mRmR, Information Gain (IG), SelectKBest (SKB), Fast Correlation-Based Filter (FCBF), and XGBoost (XGB). These feature selection techniques were assessed using various classification algorithms including Random Forest (RF), Support Vector Machine (SVM), LogitBoost, Decision Tree, and AdaBoost, all executed with default parameters. The feature selection methods were tested using Monte Carlo Cross-Validation, where performance metrics obtained for each iteration were averaged to ensure robustness. Our findings reveal that miRGediNET outperforms traditional methods in accuracy and Area Under the Curve (AUC), emphasizing its superior capability to identify key genes that bridge miRNA interactions and breast cancer mechanisms. Notably, both miRGediNET and Information Gain (IG) feature selection consistently identified ESR1, a critical biomarker frequently reported in recent research associated with breast cancer prognosis and resistance to endocrine therapies. This integrative approach provides deeper biological insights into miRNA-disease interactions, paving the way for enhanced patient stratification, biomarker discovery, and personalized medicine strategies. The miRGediNET tool, developed on the KNIME platform, offers a practical resource for further exploration in the field of bioinformatics and oncology.
Citation - Scopus: 2
Data-Driven Methods for Optimal Setting of Legacy Control Devices in Distribution Grids
(IEEE Computer Society, 2024-07-21) Savasci, Alper; Ceylan, Oǧuzhan; Paudyal, Sumit
This study presents machine learning-based dispatch strategies for legacy voltage regulation devices, i.e., onload tap changers (OLTCs), step-voltage regulators (SVRs), and switched-capacitors (SCs) in modern distribution networks. The proposed approach utilizes k-nearest neighbor (KNN), random forest (RF), and neural networks (NN) to map nodal net active and reactive injections to the optimal legacy controls and resulting voltage magnitudes. To implement these strategies, first, an efficient optimal power flow (OPF) is formulated as a mixed-integer linear program that obtains optimal decisions of tap positions for OLTCs, SVRs, and on/off status of SCs. Then, training and testing datasets are generated by solving the OPF model for daily horizons with 1-hr resolution for varying loading and photovoltaic (PV) generation profile. Case studies on the 33-node feeder demonstrate high-accuracy mapping between the input feature and the output vector, which is promising for integrated Volt/VAr control schemes. © 2024 Elsevier B.V., All rights reserved.
Citation - Scopus: 4
Computational Detection of Pre-MicroRNAs
(Humana Press Inc., 2021-08-26) Saçar Demirci, Müşerref Duygu
MicroRNA (miRNA) studies have been one of the most popular research areas in recent years. Although thousands of miRNAs have been detected in several species, the majority remains unidentified. Thus, finding novel miRNAs is a vital element for investigating miRNA mediated posttranscriptional gene regulation machineries. Furthermore, experimental methods have challenging inadequacies in their capability to detect rare miRNAs, and are also limited to the state of the organism under examination (e.g., tissue type, developmental stage, stress-disease conditions). These issues have initiated the creation of high-level computational methodologies endeavoring to distinguish potential miRNAs in silico. On the other hand, most of these tools suffer from high numbers of false positives and/or false negatives and as a result they do not provide enough confidence for validating all their predictions experimentally. In this chapter, computational difficulties in detection of pre-miRNAs are discussed and a machine learning based approach that has been designed to address these issues is reviewed. © 2021 Elsevier B.V., All rights reserved.
Colorectal Cancer Prediction via Applying Recursive Cluster Elimination With Intra-Cluster Feature Elimination on Metagenomic Pathway Data
(Springer International Publishing AG, 2024) Temiz, Mustafa; Kuzudisli, Cihan; Yousef, Malik; Bakir-Gungor, Burcu
Advances in next-generation sequencing and in "-omics" technologies enable the characterization of the human gut microbiome. Colorectal cancer (CRC), the third most common cancer worldwide, is caused by genetic mutations, environmental influences, and abnormalities in the gut microbiota. The aim of this study is to identify pathways that influence host metabolism in CRC patients. The CRC-related metagenomic dataset used in this study contains the relative abundance values of 551 pathways calculated for 1262 samples. Here, two different approaches based on the feature grouping reduce the number of features by considering relevant features as groups, eliminate irrelevant features, and perform classification. The recursive cluster elimination with intra-cluster feature elimination (RCE-IFE) approach achieves anAUCof 0.72 using an average of 66.2 features on CRC-associated metagenomics dataset. In these experiments, P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: S-adenosyl-L-methionine cycle I pathways are identified as potential biomarkers associated with CRC. These experiments also reduce the number of features reported by both approaches in P163-PWY: L-lysine fermentation to acetate and butanoate and PWY-6151: Sadenosyl-L-methionine cycle I pathways reported by both approaches are considered possible CRC-related biomarkers. This study contributes to the molecular diagnosis and treatment of colorectal cancer by revealing the pathways associated with CRC. Our results are promising for the study of the gut microbiota and its role in CRC.
Citation - Scopus: 1
Integrative Analyses in Omics Data: Machine Learning Perspective
(Deutsche Gesellschaft fur Medizinische Informatik, Biometrie und Epidemiologie e.V., 2023) Ünlü Yazici, Miray; Bakir-Güngör, Burcu; Yousef, Malik; Yazici, Miray Unlu
Developments in the high throughput technologies have enabled the production of an immense amount of knowledge at the multi-omics level. Considering complex diseases which are affected by multi-factors, single omics datasets might not be sufficient to unveil the molecular mechanisms of heterogeneous diseases. Providing a comprehensive and systematic overview to explain disease hallmarks in significant depth is critical. Utilizing multi-omics datasets has led to the development of a variety of tools and platforms. Machine learning models are utilized in a wide variety of tools to tackle the complexity of disorders and to identify new biomolecular signatures and potential markers. Underlying aspects of these approaches are based on training the models for making predictions and classification of the given data. In this review, we describe current machine learning-based approaches and available implementations. Challenges in the enlightenment of disease mechanisms of onset and progression and future development of the field of medicine will be discussed. The prominence of biological interpretation of model output with corresponding biological knowledge will be also covered in this review. © 2023 Elsevier B.V., All rights reserved.

Scopus İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results