WoS İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/394

Browse

Search Results

Now showing 1 - 4 of 4
  • Conference Object
    Citation - WoS: 1
    Citation - Scopus: 1
    Textnettopics-SFTS-SBTS Textnettopics Scoring Approaches Based Sequential Forward and Backward
    (Springer International Publishing AG, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
    TextNetTopics is a text classification-based topic modeling approach that performs topic selection rather than word selection to train a machine learning algorithm. However, one main limitation of TextNetTopics is that its scoring component (the S component) assesses each topic independently and ranks them accordingly, neglecting the potential relationship between topics. In order to address this limitation and improve the classification performance, this study introduces an enhancement to TextNetTopics. TextNetTopics-SFTS-SBTS integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. This integration aims to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained across three datasets offer valuable insights into the context-dependent effectiveness of the new scoring mechanisms across diverse datasets and varying numbers of topics involved in the analysis.
  • Conference Object
    Citation - Scopus: 1
    Semant - Feature Group Selection Utilizing Fasttext-Based Semantic Word Grouping, Scoring, and Modeling Approach for Text Classification
    (Springer International Publishing AG, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
    Text classification presents a challenge due to its high-dimensional feature space. As such, devising an effective feature selection scheme is essential. In this study, we present SEMANT, a novel hybrid filter-wrapper feature selection method that utilizes filter-based Chi-Square and the wrapper-based G-S-M approach. SEMANT incorporates fastText neural word embedding similarities to promote greater semantic inclusion in the selection of features for text classification tasks. The performance of the proposed method was investigated on the WOS-5736 and LitCovid datasets and compared with TextNetTopics, a topic modeling-based topic selection algorithm for text classification. Experimental results confirm that the proposed approach outperforms its alternative.
  • Article
    Citation - WoS: 48
    Citation - Scopus: 65
    Review of Feature Selection Approaches Based on Grouping of Features
    (PeerJ Inc, 2023-07-17) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Bulut, Nurten; Qaqish, Bahjat; Yousef, Malik
    With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly -ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
  • Article
    Citation - WoS: 35
    Citation - Scopus: 42
    Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Different Feature Selection Methods
    (PeerJ Inc, 2022-04-25) Bakir-Gungor, Burcu; Lar, Hilal Hac; Jabeer, Amhar; Nalbantoglu, Ozkan Ufuk; Aran, Oya; Yousef, Malik; Hacilar, Hilal
    The tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.