Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 5 of 5

Citation - Scopus: 1
eTNT: Enhanced Textnettopics With Filtered LDA Topics and Sequential Forward / Backward Topic Scoring Approaches
(Science and Information Organization, 2024) Voskergian, Daniel; Jayousi, Rashid; Bakir-Güngör, Burcu
TextNetTopics is a novel text classification-based topic modelling approach that focuses on topic selection rather than individual word selection to train a machine learning algorithm. However, one key limitation of TextNetTopics is its scoring component, which evaluates each topic in isolation and ranks them accordingly, ignoring the potential relationships between topics. In addition, the chosen topics may contain redundant or irrelevant features, potentially increasing the feature set size and introducing noise that can degrade the overall model performance. To address these limitations and improve the classification performance, this study introduces an enhancement to TextNetTopics. eTNT integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. Moreover, it incorporates a filtering component that aims to enhance topics' quality and discriminative power by removing non-informative features from each topic using Random Forest feature importance values. These integrations aim to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained from the WOS-5736, LitCovid, and MultiLabel datasets provide valuable insights into the superior effectiveness of eTNT compared to its counterpart, TextNetTopics. © 2024 Elsevier B.V., All rights reserved.
Linear Vs. Non-Linear Embedding Methods in Recommendation Systems
(Institute of Electrical and Electronics Engineers Inc., 2022-09-07) Gurler, Kerem; Cos¸kun, Mustafa; Karagenc, Safak; Orun, Gokhan; Kuleli Pak, Burcu Kuleli; Güngör, Vehbi Çağrı; Coskun, Mustafa; Pak, Burcu Kuleli
Predicting customer interest in items is very crucial in direct marketing as it can potentially boost sales. Data mining techniques are developed to predict which items a particular user might be interested in based on their purchase history or explicit feedback in form of ratings or comments. Recently, non-linear and linear methods have been developed for this purpose. In this study, we applied Neighborhood based Collaborative Filtering (CF), Matrix Factorization (MF), Singular Value Decomposition (SVD), Neural Graph CF (NGCF) and Light Graph Convolutional Network (LightGCN) on explicit user product rating data which is acquired from the online gaming and mobile entertainment platform called HADI. We compared the results of node embedding methods in terms of Precision@k, Recall@k and NDCG@k values. SVD and LightGCN showed the best test performance and SVD was significantly superior to LightGCN in terms of training speed. To further increase predictive performance of SVD, we have applied classification with Logistic Regression and Deep Random Forest on user and item embeddings created by the SVD. © 2022 Elsevier B.V., All rights reserved.
Citation - Scopus: 5
Identifying Taxonomic Biomarkers of Colorectal Cancer in Human Intestinal Microbiota Using Multiple Feature Selection Methods
(Institute of Electrical and Electronics Engineers Inc., 2022-09-07) Jabeer, Amhar; Kocak, Aysegul; Akkaş, Huseyin; Yenisert, Ferhan; Nalbantoĝlu, Özkan Ufuk; Yousef, Malik; Bakir-Güngör, Burcu; Bakir Gungor, Burcu
A variety of bacterial species called gut microbiota work together to maintain a steady intestinal environment. The gastrointestinal tract contains tremendous amount of different species including archaea, bacteria, fungi, and viruses. While these organisms are crucial immune system stabilizers, the dysbiosis of the intestinal flora has been related to gastrointestinal disorders including Colorectal cancer (CRC), intestinal cancer, irritable bowel syndrome and inflammatory bowel disease. In the last decade, next-generation sequencing (NGS) methods have accelerated the identification of human gut flora. CRC is a deathly condition that has been on the rise in the last century, affecting half a million people each year. Since early CRC diagnosis is critical for an effective treatment, there is an immediate requirement for a classification system that can expedite CRC diagnosis. In this study, via analyzing the available metagenomics data on CRC, we aim to facilitate the CRC diagnosis via finding biomarkers linked with CRC, and via building a classification model. We have obtained the metagenomic sequencing data of the healthy individuals and CRC patients from a metagenome-wide association analysis and we have classified this data according to the disease stages. Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), Extreme Gradient Boosting (XGBoost), min redundancy max relevance (mRMR), Information Gain (IG) and Select K Best (SKB) feature selection algorithms were utilized to cope with the complexity of the features. We observed that the SKB, IG, and XGBoost techniques made significant contributions to decrease the microbiota in use for CRC diagnosis, thereby reducing cost and time. We realized that our Random Forest classifier outperformed Adaboost, Support Vector Machine, Decision Tree, Logitboost and stacking ensemble classifiers in terms of CRC classification performance. Our results reiterated some known and some potential microbiome associated mechanisms in CRC, which could aid the design of new diagnostics based on the microbiome. © 2022 Elsevier B.V., All rights reserved.
Citation - Scopus: 22
Assessing Employee Attrition Using Classifications Algorithms
(Association for Computing Machinery, 2020-05-15) Ozdemir, Fatma; Cos¸kun, Mustafa; Gezer, Cengiz; Güngör, Vehbi Çağrı; Coskun, Mustafa; Cagri Gungor, V.
Employees leave an organization when other organizations offer better opportunities than their current organizations. Continuity and sustenance and even completion of jobs are crucial issues for the companies not to suffer financial losses. Especially if the talented employees, who are at critical positions in the companies, leave the job, it becomes difficult for the organizations to maintain their businesses. Today, organizations would like to predict attrition of their employees and plan and prepare for it. However, the HR departments of organizations are not advanced enough to make such predictions in a handcrafted manner. For this reason, organizations are looking for new systems or methods that automatize the prediction of employee attrition utilizing data mining methods. In this study, we use IBM HR data set and apply different classification methods, such as Support Vector Machine (SVM), Random Forest, J48, LogitBoost, Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naive Bayes, Bagging, AdaBoost, Logistic Regression, to predict the employee attrition. Different from exiting studies, we systematically evaluate our findings with various classification metrics, such as F-measure, Area Under Curve, accuracy, sensitivity, and specificity. We observe that data mining methods can be useful for predicting the employee attrition. © 2022 Elsevier B.V., All rights reserved.
Population Specific Classification of Colorectal Cancer With Meta-Analysis of Metagenomic Data
(Institute of Electrical and Electronics Engineers Inc., 2023-10-11) Temiz, Mustafa; Yousef, Malik; Bakir-Güngör, Burcu
Advances in next-generation sequencing and '-omics' technologies makes it possible to characterize the human gut microbiome. While some of these microorganisms are important regulators of our immune system, modulation of the microbiota leads to a variety of diseases. Colorectal cancer (CRC), the third most common cancer worldwide, is caused by genetic mutations, environmental conditions, and abnormalities in the gut microbiota. Using various machine learning methods and meta-analysis techniques, this study aims to build a classification model that can help in CRC diagnosis by analyzing metagenomic datasets of different populations obtained at the species level. Using 8 different countries and 9 different metagenomic datasets, 3 different meta-analyzes are performed: within-population, cross-population, and one population is selected for testing and the rest is used as a training dataset (LODO). For CRC classification, 4 different classification algorithms (Random Forest (RF), Logitboost, Adaboost, and Decision Tree (DT)) are used. The best performance among these methods was obtained with the Random Forest algorithm with an AUC of 0.98 by using JP for the training data set and JPN populations for the test data set in the cross-population performance evaluation. © 2023 Elsevier B.V., All rights reserved.

Scopus İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results