Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 17

Citation - Scopus: 1
The Identification of Discriminative Single Nucleotide Polymorphism Sets for the Classification of Behçet's Disease
(Institute of Electrical and Electronics Engineers Inc., 2018-09) Görmez, Yasin; Işik, Yunus Emre; Bakir-Güngör, Burcu
Behçet's disease is a long-term multisystem inflammatory disorder, characterized by recurrent attacks affecting several organs. As the genotyping individuals get cheaper and easier following the developments in genomic technologies, genome-wide association studies (GWAS) emerged. By this means, via studying big-sized case-control groups for a specific disease, potential genetic variations, single nucleotide polymorphisms (SNPs) are identified. Although several genetic risk factors are identified for Behçet's disease with the help of these studies via scanning around a million of SNPs, these variations could only explain up to 20% of the disease's genetic risk. In this study, for Behçet's disease classification, via comparing all the SNPs genotyped in GWAS, with the SNPs selected via using genetic knowledge, gain ratio and information gain; both reduction in the feature size and improvement in the classification accuracy is aimed. Also, using different classification algorithms such as random forest, k-nearest neighbour and logistic regression, their effects on the classification accuracy are investigated. Our results showed that compared to other feature selection methods, with at least 81% success rate, the selection of the SNPs using the genetic information (of their GWAS p-values, indicating the significance of the SNP against the disease) provides 15% to 42% improvement in all classification algorithms. This improvement is statistically sound. While gain ratio and information gain feature selection techniques yield similar classification accuracies, the models using all SNPs could not exceed 50% accuracies and results in the worst performance. © 2019 Elsevier B.V., All rights reserved.
Citation - Scopus: 10
On Comparative Classification of Relevant COVID-19 Tweets
(Institute of Electrical and Electronics Engineers Inc., 2021-09-15) Bakal, Gokhan; Abar, Orhan
Due to the impressive information dissemination power of social networks such as Twitter, people tend to check social networks and Web pages more than other traditional news sources, including newspapers, TV news programs, or radio channels. In that sense, the information carried by the content of the shared social media posts becomes much more considerable. However, most of the posts are commonly either irrelevant or inaccurate. Besides, the more critical case than the correctness of the information is the diffusion speed on Twitter through the reply or retweet actions. These activities make the initial situation even more complicated than itself due to the unregulated nature of the social networks and the lack of an immediate verification mechanism for the correctness of the posts. When we consider the current Covid-19 pandemic period (causing the coronavirus disease), one of the most utilized information resources is Twitter except the official health administration institutions. Thereupon, examining the correctness of the information related to the Covid-19 pandemic by computational techniques (e.g., Data Mining, Machine Learning, and Deep Learning) has been gaining popularity and remains a substantial task. Hence, we mainly focused on analyzing the correctness of the posts related to the current pandemic shared on the Twitter platform. Therefore, the overall goal of this work is to classify the relevant tweets using linear and non-linear machine learning models. We achieved the best F1 performance score (99%) with the neural network model using the unigram features & threshold value of 50 among all model configurations. © 2022 Elsevier B.V., All rights reserved.
Multi-Method Text Summarization: Evaluating Extractive and BART-Based Approaches on CNN/Daily Mail
(Institute of Electrical and Electronics Engineers Inc., 2025-06-27) Inal, Yasin; Bakal, Gokhan; Esit, Muhammed
With the exponential growth of digital content, efficient text summarization has become increasingly crucial for managing information overload. This paper presents a comprehensive approach to text summarization using both extractive and abstractive methods, implemented on the CNN/Daily Mail dataset. We leverage pre-trained BART (Bidirectional and AutoRegressive Transformers) models and fine-tuning techniques to generate high-quality summaries. Our approach demonstrates significant improvements, with our best model trained on 287 k samples achieving ROUGE-1 F1 scores of 0.4174, ROUGE-2 F1 scores of 0.1932, and ROUGE-L F1 scores of 0.2910. We provide detailed comparisons between extractive methods and various BART model configurations, analyzing the impact of training dataset size and model architecture on summarization quality. Additionally, we share our implementation through an opensource NLP toolkit to facilitate further research and practical applications in the field. © 2025 Elsevier B.V., All rights reserved.
Linear Vs. Non-Linear Embedding Methods in Recommendation Systems
(Institute of Electrical and Electronics Engineers Inc., 2022-09-07) Gurler, Kerem; Cos¸kun, Mustafa; Karagenc, Safak; Orun, Gokhan; Kuleli Pak, Burcu Kuleli; Güngör, Vehbi Çağrı; Coskun, Mustafa; Pak, Burcu Kuleli
Predicting customer interest in items is very crucial in direct marketing as it can potentially boost sales. Data mining techniques are developed to predict which items a particular user might be interested in based on their purchase history or explicit feedback in form of ratings or comments. Recently, non-linear and linear methods have been developed for this purpose. In this study, we applied Neighborhood based Collaborative Filtering (CF), Matrix Factorization (MF), Singular Value Decomposition (SVD), Neural Graph CF (NGCF) and Light Graph Convolutional Network (LightGCN) on explicit user product rating data which is acquired from the online gaming and mobile entertainment platform called HADI. We compared the results of node embedding methods in terms of Precision@k, Recall@k and NDCG@k values. SVD and LightGCN showed the best test performance and SVD was significantly superior to LightGCN in terms of training speed. To further increase predictive performance of SVD, we have applied classification with Logistic Regression and Deep Random Forest on user and item embeddings created by the SVD. © 2022 Elsevier B.V., All rights reserved.
Identify Commonly Affected Pathways in Psychiatric Diseases
(Institute of Electrical and Electronics Engineers Inc., 2018-09) Bulut, Umit; Bakir-Güngör, Burcu
Genome-wide association studies (GWAS) are an extraordinary source of information when it comes to revealing the common variations of human complex diseases. Until now, the large amount of data generated from these studies have not been shown its full potential enough to identify the molecular and functional framework to be able to understand how a molecular system works. Following a more specific perspective, this study focused on the identification of commonly affected pathways of psychiatric diseases. The pathway term as used in molecular biology, depicts a simplified model of a process within the cell or tissue. Lately, several GWAS datasets are publicly available for various disease types such as psychiatric, immune-related, neurodegenerative, cardiovascular and such. A study on each disease and pairwise comparison to understand the behavior of disease and system would be time consuming and exhaustive. Instead of handling the results of these studies one by one, grouping diseases by target points is a more efficient way. This work aims to get one step closer to reveal key points of diseases and target these points to develop personalized medicine approaches. Especially for complex diseases, every drug doesn't show the same effect in every people. This paper contains the definition of molecular pathways, methods to identify disease related pathways, and to find common pathways pairwise in psychiatric diseases. © 2019 Elsevier B.V., All rights reserved.
Generating Linguistic Advice for the Carbon Limit Adjustment Mechanism
(Springer Science and Business Media Deutschland GmbH, 2023-10-02) Fidan, Fatma Şener; Aydogan, Sena; Akay, Diyar
Linguistic summarization, a subfield of data mining, generates summaries in natural language for comprehending big data. This approach simplifies the incorporation of information into decision-making processes since no specialized knowledge is needed to understand the generated language summaries. The present research employs linguistic summarization to examine the circumstances surrounding the Carbon Border Adjustment Mechanism, one of the most significant regulations confronting exporting nations to the European Union, and will be adopted to support sustainable growth. In this paper, associated with several attributes of the countries and product flow from exporting countries to European countries were defined as nodes and relations, respectively. Before the modeling phase, fuzzy c-means automatically identified fuzzy sets and membership degrees of attributes. During the modeling phase, summary forms were generated using polyadic quantifiers. A total of 1944 linguistic summaries were produced between exporting countries and European countries. Thirty-five summaries have a truth degree greater than or equal to the threshold value of 0.9, which is considered reasonable. The provision of natural language descriptions of the Carbon Border Adjustment Mechanism is intended to aid decision-makers and policymakers in their deliberations. © 2023 Elsevier B.V., All rights reserved.
Citation - WoS: 23
Citation - Scopus: 52
Evaluation of Classification Algorithms, Linear Discriminant Analysis and a New Hybrid Feature Selection Methodology for the Diagnosis of Coronary Artery Disease
(Institute of Electrical and Electronics Engineers Inc., 2018-12) Kolukisa, Burak; Hacilar, Hilal; Göy, Gökhan; Kus, Mustafa; Bakir-Güngör, Burcu; Aral, Atilla; Güngör, Vehbi Çağrı
According to the World Health Organization (WHO), 31% of the world's total deaths in 2016 (17.9 million) was due to cardiovascular diseases (CVD). With the development of information technologies, it has become possible to predict whether people have heart diseases or not by checking certain physical and biochemical values at a lower cost. In this study, we have evalated a set of different classification algorithms, linear discriminant analysis and proposed a new hybrid feature selection methodology for the diagnosis of coronary heart diseases (CHD). Throughout this research effort, using three publicly available Heart Disease diagnosis datasets (UCI Machine Learning Repository), we have conducted comparative performance evaluations in terms of accuracy, sensitivity, specificity, F-measure, AUC and running time. © 2023 Elsevier B.V., All rights reserved.
Citation - WoS: 2
Citation - Scopus: 4
Data Mining Techniques in Direct Marketing on Imbalanced Data Using Tomek Link Combined With Random Under-Sampling
(Assoc Computing Machinery, 2021-05-27) Yilmaz, Umit; Gezer, Cengiz; Aydin, Zafer; Gungor, V. CaGri; Yllmaz, Ümit; Aydln, Zafer
Determining the potential customers is very important in direct marketing. Data mining techniques are one of the most important methods for companies to determine potential customers. However, since the number of potential customers is very low compared to the number of non-potential customers, there is a class imbalance problem that significantly affects the performance of data mining techniques. In this paper, different combinations of basic and advanced resampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE), Tomek Link, RUS, and ROS were evaluated to improve the performance of customer classification. Different feature selection techniques are used in order the decrease the number of non-informative features from the data such as Information Gain, Gain Ratio, Chi-squared, and Relief. Classification performance was compared and utilized using several data mining techniques, such as LightGBM, XGBoost, Gradient Boost, Random Forest, AdaBoost, ANN, Logistic Regression, Decision Trees, SVC, Bagging Classifier based on ROC AUC and sensitivity metrics. A combination of Tomek Link and Random Under-Sampling as a resampling technique and Chi-squared method as feature selection algorithm showed superior performance among the other combinations. Detailed performance evaluations demonstrated that with the proposed approach, LightGBM, which is a gradient boosting algorithm based on decision tree, gave the best results among the other classifiers with 0.947 sensitivity and 0.896 ROC AUC value.
Citation - Scopus: 22
Assessing Employee Attrition Using Classifications Algorithms
(Association for Computing Machinery, 2020-05-15) Ozdemir, Fatma; Cos¸kun, Mustafa; Gezer, Cengiz; Güngör, Vehbi Çağrı; Coskun, Mustafa; Cagri Gungor, V.
Employees leave an organization when other organizations offer better opportunities than their current organizations. Continuity and sustenance and even completion of jobs are crucial issues for the companies not to suffer financial losses. Especially if the talented employees, who are at critical positions in the companies, leave the job, it becomes difficult for the organizations to maintain their businesses. Today, organizations would like to predict attrition of their employees and plan and prepare for it. However, the HR departments of organizations are not advanced enough to make such predictions in a handcrafted manner. For this reason, organizations are looking for new systems or methods that automatize the prediction of employee attrition utilizing data mining methods. In this study, we use IBM HR data set and apply different classification methods, such as Support Vector Machine (SVM), Random Forest, J48, LogitBoost, Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naive Bayes, Bagging, AdaBoost, Logistic Regression, to predict the employee attrition. Different from exiting studies, we systematically evaluate our findings with various classification metrics, such as F-measure, Area Under Curve, accuracy, sensitivity, and specificity. We observe that data mining methods can be useful for predicting the employee attrition. © 2022 Elsevier B.V., All rights reserved.
A Data Mining Method for Refining Groups in Data Using Dynamic Model Based Clustering
(IEEE, 2013-06) Servi, Tayfun; Erol, H.
A new data mining method is proposed for determining the number and structure of clusters, and refining groups in multivariate heterogeneous data set including groups, partly and completely overlapped group structures by using dynamic model based clustering. It is called dynamic model based clustering since the structure of model changes at each stage of refinement process dynamically. The proposed data mining method works without data reduction for high dimensional data in which some of variables including completely overlapped situations. © 2013 IEEE. © 2013 Elsevier B.V., All rights reserved.

Scopus İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results