Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 7 of 7
  • Article
    Citation - WoS: 2
    Citation - Scopus: 4
    RCE-IFE: Recursive Cluster Elimination With Intra-Cluster Feature Elimination
    (PeerJ Inc, 2025-02-07) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Qaqish, Bahjat; Yousef, Malik
    The computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniques that seek to detect strong correlations among features and identify irrelevant features. In this work, we propose the Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE) method that utilizes feature grouping and iterates grouping and elimination steps in a supervised context. We assess dimensionality reduction and discriminatory capabilities of RCE-IFE on various high-dimensional datasets from different biological domains. For a set of gene expression, MicroRNA (miRNA) expression, and methylation datasets, the performance of RCE-IFE is comparatively evaluated with RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE. On average, RCE-IFE attains an area under the curve (AUC) of 0.85 among tested expression datasets with the fewest features and the shortest running time, while RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE achieve similar AUCs of 0.84 and 0.83, respectively. RCE-IFE and SVM-RCE yield AUCs of 0.79 and 0.68, respectively when averaged over seven different metagenomics datasets, with RCE-IFE significantly reducing feature subsets. Furthermore, RCE-IFE surpasses several state-of-the-art FS methods, such as Minimum Redundancy Maximum Relevance (MRMR), Fast Correlation-Based Filter (FCBF), Information Gain (IG), Conditional Mutual Information Maximization (CMIM), SelectKBest (SKB), and eXtreme Gradient Boosting (XGBoost), obtaining an average AUC of 0.76 on five gene expression datasets. Compared with a similar tool, Multi-stage, RCE-IFE gives a similar average accuracy rate of 89.27% using fewer features on four cancer-related datasets. The comparability of RCE-IFE is also verified with other biological domain knowledge-based Grouping-Scoring-Modeling (G-S-M) tools, including mirGediNET, 3Mint, and miRcorrNet. Additionally, the biological relevance of the selected features by RCE-IFE is evaluated. The proposed method also exhibits high consistency in terms of the selected features across multiple runs. Our experimental findings imply that RCE-IFE provides robust classifier performance and significantly reduces feature size while maintaining feature relevance and consistency.
  • Article
    Citation - Scopus: 2
    Prediction of Colorectal Cancer Based on Taxonomic Levels of Microorganisms and Discovery of Taxonomic Biomarkers Using the Grouping-Scoring (G-S-M) Approach
    (Elsevier Ltd, 2025-03) Bakir-Güngör, Burcu; Temiz, Mustafa; Canakcimaksutoglu, Beyza; Yousef, Malik
    Colorectal cancer (CRC) is one of the most prevalent forms of cancer globally. The human gut microbiome plays an important role in the development of CRC and serves as a biomarker for early detection and treatment. This research effort focuses on the identification of potential taxonomic biomarkers of CRC using a grouping-based feature selection method. Additionally, this study investigates the effect of incorporating biological domain knowledge into the feature selection process while identifying CRC-associated microorganisms. Conventional feature selection techniques often fail to leverage existing biological knowledge during metagenomic data analysis. To address this gap, we propose taxonomy-based Grouping Scoring Modeling (G-S-M) method that integrates biological domain knowledge into feature grouping and selection. In this study, using metagenomic data related to CRC, classification is performed at three taxonomic levels (genus, family and order). The MetaPhlAn tool is employed to determine the relative abundance values of species in each sample. Comparative performance analyses involve six feature selection methods and four classification algorithms. When experimented on two CRC associated metagenomics datasets, the highest performance metric, yielding an AUC of 0.90, is observed at the genus taxonomic level. At this level, 7 out of top 10 groups (Parvimonas, Peptostreptococcus, Fusobacterium, Gemella, Streptococcus, Porphyromonas and Solobacterium) were commonly identified for both datasets. Moreover, the identified microorganisms at genus, family, and order levels are thoroughly discussed via refering to CRC-related metagenomic literature. This study not only contributes to our understanding of CRC development, but also highlights the applicability of taxonomy-based G-S-M method in tackling various diseases. © 2025 Elsevier B.V., All rights reserved.
  • Article
    Citation - WoS: 9
    Citation - Scopus: 15
    MicroBiomeGSM: The Identification of Taxonomic Biomarkers From Metagenomic Data Using Grouping, Scoring and Modeling (G-S-M) Approach
    (Frontiers Media S.A., 2023-11-22) Bakir-Gungor, Burcu; Temiz, Mustafa; Jabeer, Amhar; Wu, Di; Yousef, Malik
    Numerous biological environments have been characterized with the advent of metagenomic sequencing using next generation sequencing which lays out the relative abundance values of microbial taxa. Modeling the human microbiome using machine learning models has the potential to identify microbial biomarkers and aid in the diagnosis of a variety of diseases such as inflammatory bowel disease, diabetes, colorectal cancer, and many others. The goal of this study is to develop an effective classification model for the analysis of metagenomic datasets associated with different diseases. In this way, we aim to identify taxonomic biomarkers associated with these diseases and facilitate disease diagnosis. The microBiomeGSM tool presented in this work incorporates the pre-existing taxonomy information into a machine learning approach and challenges to solve the classification problem in metagenomics disease-associated datasets. Based on the G-S-M (Grouping-Scoring-Modeling) approach, species level information is used as features and classified by relating their taxonomic features at different levels, including genus, family, and order. Using four different disease associated metagenomics datasets, the performance of microBiomeGSM is comparatively evaluated with other feature selection methods such as Fast Correlation Based Filter (FCBF), Select K Best (SKB), Extreme Gradient Boosting (XGB), Conditional Mutual Information Maximization (CMIM), Maximum Likelihood and Minimum Redundancy (MRMR) and Information Gain (IG), also with other classifiers such as AdaBoost, Decision Tree, LogitBoost and Random Forest. microBiomeGSM achieved the highest results with an Area under the curve (AUC) value of 0.98% at the order taxonomic level for IBDMD dataset. Another significant output of microBiomeGSM is the list of taxonomic groups that are identified as important for the disease under study and the names of the species within these groups. The association between the detected species and the disease under investigation is confirmed by previous studies in the literature. The microBiomeGSM tool and other supplementary files are publicly available at: https://github.com/malikyousef/microBiomeGSM.
  • Article
    Citation - WoS: 29
    Citation - Scopus: 32
    Liver Fibrosis Staging Using CT Image Texture Analysis and Soft Computing
    (Elsevier, 2014-12) Kayaalti, Omer; Aksebzeci, Bekir Hakan; Karahan, Ibrahim Okkes; Deniz, Kemal; Ozturk, Mehmet; Yilmaz, Bulent; Asyali, Musa Hakan
    Liver biopsy is considered to be the gold standard for analyzing chronic hepatitis and fibrosis; however, it is an invasive and expensive approach, which is also difficult to standardize. Medical imaging techniques such as ultrasonography, computed tomography (CT), and magnetic resonance imaging are non-invasive and helpful methods to interpret liver texture, and may be good alternatives to needle biopsy. Recently, instead of visual inspection of these images, computer-aided image analysis based approaches have become more popular. In this study, a non-invasive, low-cost and relatively accurate method was developed to determine liver fibrosis stage by analyzing some texture features of liver CT images. In this approach, some suitable regions of interests were selected on CT images and a comprehensive set of texture features were obtained from these regions using different methods, such as Gray Level Co-occurrence matrix (GLCM), Laws' method, Discrete Wavelet Transform (DWT), and Gabor filters. Afterwards, sequential floating forward selection and exhaustive search methods were used in various combinations for the selection of most discriminating features. Finally, those selected texture features were classified using two methods, namely, Support Vector Machines (SVM) and k-nearest neighbors (k-NN). The mean classification accuracy in pairwise group comparisons was approximately 95% for both classification methods using only 5 features. Also, performance of our approach in classifying liver fibrosis stage of subjects in the test set into 7 possible stages was investigated. In this case, both SVM and k-NN methods have returned relatively low classification accuracies. Our pairwise group classification results showed that DWT, Gabor, GLCM, and Laws' texture features were more successful than the others; as such features extracted from these methods were used in the feature fusion process. Fusing features from these better performing families further improved the classification performance. The results show that our approach can be used as a decision support system in especially pairwise fibrosis stage comparisons. (C) 2014 Elsevier B.V. All rights reserved.
  • Article
    Citation - WoS: 5
    Citation - Scopus: 4
    Investigating Strain Rate Effects on Damage Mechanisms in Hybrid Laminated Composites Using Acoustic Emission
    (Elsevier Sci Ltd, 2025-12) Gulsen, Abdulkadir; Kolukisa, Burak; Etcil, Mustafa; Caliskan, Umut; Zafar, Hafiz Muhammad Numan; Demirbas, Munise Didem; Bakir-Gungor, Burcu
    Hybrid composites, which combine distinct fiber types such as carbon, basalt, and aramid, provide a synergistic balance of strength, stiffness, impact resistance, and energy dissipation, making them appealing for critical applications in aerospace, automotive, and other high-performance industries. Monitoring damage progression in these composites is vital for ensuring structural integrity and preventing catastrophic failures. Acoustic emission (AE) serves as a powerful, noninvasive technique for real-time structural health monitoring, capturing the transient stress waves generated when damage events occur. This study utilizes AE to examine the influence of strain rate on damage modes in carbon/basalt/aramid hybrid composites under three-point bending. An unsupervised feature selection based on Laplacian scores is employed to identify the most relevant AE features with damage modes, while SHapley Additive Explanations (SHAP) are used to evaluate the correlation between AE features and strain rates. The correlation analysis results indicate that peak frequency (PF) serves as a key indicator, demonstrating significant shifts at higher strain rates. Gaussian Mixture Model (GMM) clustering is used to analyze hybrid composites by examining clustered AE signals based on selected features identified through Laplacian scores, with Silhouette scores employed to determine the optimal number of clusters. This study highlights the role of AE in understanding fiber interactions and damage evolution, offering valuable insights into the mechanical performance and optimization of carbon/basalt/aramid hybrid composite structures.
  • Article
    Citation - WoS: 25
    Citation - Scopus: 31
    Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
    (Frontiers Media S.A., 2021-08-25) Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, Amhar; Nalbantoglu, O. Ufuk; Yousef, Malik
    Human gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.</p>
  • Article
    Citation - WoS: 10
    Citation - Scopus: 13
    A Noise-Aware Feature Selection Approach for Classification
    (Springer, 2021-02-17) Sabzekar, Mostafa; Aydin, Zafer
    A noise-aware version of support vector machines is utilized for feature selection in this paper. Combining this method and sequential backward search (SBS), a new algorithm for removing irrelevant features is proposed. Although feature selection methods in the literature which utilize support vector machines have provided acceptable results, noisy samples and outliers may affect the performance of SVM and feature selections method, consequently. Recently, we have proposed relaxed constrains SVM (RSVM) which handles noisy data and outliers. Each training sample in RSVM is associated with a degree of importance utilizing the fuzzy c-means clustering method. Therefore, a less importance degree is assigned to noisy data and outliers. Moreover, RSVM has more relaxed constraints that can reduce the effect of noisy samples. Feature selection increases the accuracy of different machine learning applications by eliminating noisy and irrelevant features. In the proposed RSVM-SBS feature selection algorithm, noisy data have small effect on eliminating irrelevant features. Experimental results using real-world data verify that RSVM-SBS has better results in comparison with other feature selection approaches utilizing support vector machines.