Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 11
  • Conference Object
    Ensemble Churn Prediction for Internet Service Provider with Machine Learning Techniques
    (IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2020) Goy, Gokhan; Kolukisa, Burak; Bahcevan, Cenk; Gungor, Vehbi Cagri
    With the developing technology in every fields, a competitive marketing environment has been arised In this competitive environment analyzing customer behavior has become vital In particular, the ability to easily change any service provider has become vet) , critical for the company to continue its existence At the same time, the amount of financial resources spent on retaining instituters much less than to obtain new clients. In this context, the traditional methods of examining vast amount of data obtained today for establishing decision support systems have lost their validities In this study. we used a dataset which is provided by TurkNet serving as an internet service provider in Turkey. Various preprocessing steps has performed on this dataset and then classification algorithms ran. Afterwards results have obtained and compared. The results of these experiments analyzed in terms of the area under the curve value In this context the aunt successful classifier algorithm has been determined as the Random Trees algorithm with a value of 0.936.
  • Conference Object
    Multi-Method Text Summarization: Evaluating Extractive and BART-Based Approaches on CNN/Daily Mail
    (Institute of Electrical and Electronics Engineers Inc., 2025-06-27) Inal, Yasin; Bakal, Gokhan; Esit, Muhammed
    With the exponential growth of digital content, efficient text summarization has become increasingly crucial for managing information overload. This paper presents a comprehensive approach to text summarization using both extractive and abstractive methods, implemented on the CNN/Daily Mail dataset. We leverage pre-trained BART (Bidirectional and AutoRegressive Transformers) models and fine-tuning techniques to generate high-quality summaries. Our approach demonstrates significant improvements, with our best model trained on 287 k samples achieving ROUGE-1 F1 scores of 0.4174, ROUGE-2 F1 scores of 0.1932, and ROUGE-L F1 scores of 0.2910. We provide detailed comparisons between extractive methods and various BART model configurations, analyzing the impact of training dataset size and model architecture on summarization quality. Additionally, we share our implementation through an opensource NLP toolkit to facilitate further research and practical applications in the field. © 2025 Elsevier B.V., All rights reserved.
  • Article
    Citation - WoS: 3
    Citation - Scopus: 4
    Investigating the Carbon Border Adjustment Mechanism Transition Process With Linguistic Summarization Method: A Situational Analysis of Exporting Countries
    (Elsevier Sci Ltd, 2024-08) Fidan, Fatma Sener; Aydogan, Sena; Akay, Diyar; Şener Fidan, Fatma
    The Paris Agreement holds significant importance since it establishes a global framework for addressing the issue of climate change and endeavors to mitigate the release of greenhouse gases. The Carbon Border Adjustment Mechanism was introduced as an integral component of this agreement, aiming to oversee the carbon emissions associated with imported items within the European Union and provide compensation for the emissions from the nations engaged in importation. It is essential to analyze the countries involved in exporting to the European Union within the Carbon Border Adjustment Mechanism context to mitigate carbon leakage and effectively support the objectives outlined in the Paris Agreement. This research investigated 104 nations engaged in exporting activities to 27 European Union member countries. The linguistic summarization method, a descriptive data analytics tool, was employed for the analysis. A total of 42 Combined Nomenclature codes were encompassed within the scope of evaluation throughout the transition phase of the Carbon Border Adjustment Mechanism. This study examines the characteristics of exporting nations based on three variables: The Environmental Performance Index, a sustainability indicator; the Region in which the countries are located as classified by the World Bank; and the quantity of Renewable Energy Consumption. Additionally, the study explores the characteristics of EU countries, focusing on their Environmental Performance Index score and geography. The study employed fuzzy sets and the fuzzy c-means algorithm as parts of the linguistic summarization technique. Polyadic quantifiers were used to extract linguistic summaries, resulting in the acquisition of 124,227 summaries. A total of 1594 summaries have a truth degree exceeding 0.9. The findings were effectively utilized to assess the influence of the linguistic summarization approach and offered a valuable viewpoint for decisionmakers needing more expertise in this domain.
  • Conference Object
    Identify Commonly Affected Pathways in Psychiatric Diseases
    (Institute of Electrical and Electronics Engineers Inc., 2018-09) Bulut, Umit; Bakir-Güngör, Burcu
    Genome-wide association studies (GWAS) are an extraordinary source of information when it comes to revealing the common variations of human complex diseases. Until now, the large amount of data generated from these studies have not been shown its full potential enough to identify the molecular and functional framework to be able to understand how a molecular system works. Following a more specific perspective, this study focused on the identification of commonly affected pathways of psychiatric diseases. The pathway term as used in molecular biology, depicts a simplified model of a process within the cell or tissue. Lately, several GWAS datasets are publicly available for various disease types such as psychiatric, immune-related, neurodegenerative, cardiovascular and such. A study on each disease and pairwise comparison to understand the behavior of disease and system would be time consuming and exhaustive. Instead of handling the results of these studies one by one, grouping diseases by target points is a more efficient way. This work aims to get one step closer to reveal key points of diseases and target these points to develop personalized medicine approaches. Especially for complex diseases, every drug doesn't show the same effect in every people. This paper contains the definition of molecular pathways, methods to identify disease related pathways, and to find common pathways pairwise in psychiatric diseases. © 2019 Elsevier B.V., All rights reserved.
  • Conference Object
    Generating Linguistic Advice for the Carbon Limit Adjustment Mechanism
    (Springer Science and Business Media Deutschland GmbH, 2023-10-02) Fidan, Fatma Şener; Aydogan, Sena; Akay, Diyar
    Linguistic summarization, a subfield of data mining, generates summaries in natural language for comprehending big data. This approach simplifies the incorporation of information into decision-making processes since no specialized knowledge is needed to understand the generated language summaries. The present research employs linguistic summarization to examine the circumstances surrounding the Carbon Border Adjustment Mechanism, one of the most significant regulations confronting exporting nations to the European Union, and will be adopted to support sustainable growth. In this paper, associated with several attributes of the countries and product flow from exporting countries to European countries were defined as nodes and relations, respectively. Before the modeling phase, fuzzy c-means automatically identified fuzzy sets and membership degrees of attributes. During the modeling phase, summary forms were generated using polyadic quantifiers. A total of 1944 linguistic summaries were produced between exporting countries and European countries. Thirty-five summaries have a truth degree greater than or equal to the threshold value of 0.9, which is considered reasonable. The provision of natural language descriptions of the Carbon Border Adjustment Mechanism is intended to aid decision-makers and policymakers in their deliberations. © 2023 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - WoS: 2
    Citation - Scopus: 4
    Data Mining Techniques in Direct Marketing on Imbalanced Data Using Tomek Link Combined With Random Under-Sampling
    (Assoc Computing Machinery, 2021-05-27) Yilmaz, Umit; Gezer, Cengiz; Aydin, Zafer; Gungor, V. CaGri; Yllmaz, Ümit; Aydln, Zafer
    Determining the potential customers is very important in direct marketing. Data mining techniques are one of the most important methods for companies to determine potential customers. However, since the number of potential customers is very low compared to the number of non-potential customers, there is a class imbalance problem that significantly affects the performance of data mining techniques. In this paper, different combinations of basic and advanced resampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE), Tomek Link, RUS, and ROS were evaluated to improve the performance of customer classification. Different feature selection techniques are used in order the decrease the number of non-informative features from the data such as Information Gain, Gain Ratio, Chi-squared, and Relief. Classification performance was compared and utilized using several data mining techniques, such as LightGBM, XGBoost, Gradient Boost, Random Forest, AdaBoost, ANN, Logistic Regression, Decision Trees, SVC, Bagging Classifier based on ROC AUC and sensitivity metrics. A combination of Tomek Link and Random Under-Sampling as a resampling technique and Chi-squared method as feature selection algorithm showed superior performance among the other combinations. Detailed performance evaluations demonstrated that with the proposed approach, LightGBM, which is a gradient boosting algorithm based on decision tree, gave the best results among the other classifiers with 0.947 sensitivity and 0.896 ROC AUC value.
  • Conference Object
    Citation - Scopus: 7
    A Comparative Analysis on Medical Article Classification Using Text Mining & Machine Learning Algorithms
    (Institute of Electrical and Electronics Engineers Inc., 2021-09-15) Kolukisa, Burak; Dedeturk, Bilge Kagan; Dedeturk, Beyhan Adanur; Gulsen, Abdulkadir; Bakal, Gokhan; Guisen, Abdulkadir
    The document classification task is one of the widely studied research fields on multiple domains. The core motivation of the classification task is that the manual classification efforts are impractical due to the exponentially growing document volumes. Thus, we densely need to exploit automated computational approaches, such as machine learning models along with data & text mining techniques. In this study, we concentrated on the classification of medical articles specifically on common cancer types, due to the significance of the field and the decent number of available documents of interest. We deliberately targeted MEDLINE articles about common cancer types because most cancer types share a similar literature composition. Therefore, this situation makes the classification effort relatively more complicated. To this end, we built multiple machine learning models, including both traditional and deep learning architectures. We achieved the best performance (R¿82% F score) by the LSTM model. Overall, our results demonstrate a strong effect of exploiting both text mining and machine learning methods to distinguish medical articles on common cancer types. © 2022 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - WoS: 3
    Citation - Scopus: 13
    NSEM: Duygu Analizi için Özgün Yıǧınlanmiş Topluluk Yöntemi
    (Institute of Electrical and Electronics Engineers Inc., 2018-09) Işik, Yunus Emre; Görmez, Yasin; Kaynar, Oǧuz; Aydin, Zafer; Emre Isik, Yunus
    Today, people often share their ideas, opinions and feelings through forums, social media sites, blogs and similar platforms. For this reason, access to these data has become very easy. Increase in the number of shares makes it possible to analyze and use these data in terms of marketing and politics. However, due to the large number of data, it is impossible that this analysis will be done by humans. Determination of what type of emotion is included automatically is done by sentiment analysis methods. In these methods, the text is defined as a mathematical vector and classified by machine learning methods. Ensemble methods are one of the most important methods used as classifiers in sentiment analysis. In these methods, a classifier error is tried to be solved by another classifier. In sentiment analysis, the feature vector that describes the text is as important as the classifier. Feature vectors obtained using different methods can make mistakes in different places. For this reason, in this study, NSEM is proposed for sentiment analysis, which is a new ensemble method that uses 2 different classifiers and 2 different feature extraction methods. As a result of the analysis, the proposed method is the most successful method with an accuracy rate of 79.1%. © 2019 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 2
    Makine Öǧrenmesi Teknikleri Ile İnternet Servis Saǧlayıcısı için Müşteri Kayıp Tahmini
    (Institute of Electrical and Electronics Engineers Inc., 2020-09) Göy, Gökhan; Kolukisa, Burak; Bahçevan, Cenk Anıl; Güngör, Vehbi Çağrı
    With the developing technology in every fields, a competitive marketing environment has been arised. In this competitive environment, analyzing customer behavior has become vital. In particular, the ability to easily change any service provider has become very critical for the company to continue its existence. At the same time, the amount of financial resources spent on retaining customers much less than to obtain new clients. In this context, the traditional methods of examining vast amount of data obtained today for establishing decision support systems have lost their validities. In this study, we used a dataset which is provided by TurkNet serving as an internet service provider in Turkey. Various preprocessing steps has performed on this dataset and then classification algorithms ran. Afterwards results have obtained and compared. The results of these experiments analyzed in terms of the area under the curve value. In this context, the most successful classifier algorithm has been determined as the Random Trees algorithm with a value of 0.936. © 2020 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 1
    Koroner Arter Hastalığı Tanısı İçin Alan Bilgisi İçeren Topluluk Öznitelik Seçim Yöntemi
    (Institute of Electrical and Electronics Engineers Inc., 2020-10-05) Kolukisa, Burak; Güngör, Vehbi Çağrı; Bakir-Güngör, Burcu; Gungor, Burcu Bakir
    Coronary Artery Disease (CAD) is the condition where, the heart is not fed enough as a result of the accumulation of fatty matter called atheroma in the walls of the arteries. In 2016, CAD accounts for 31% (17.9 million) of the world's total deaths and its diagnosis is difficult. It is estimated that approximately 23.6 million people will die from this disease in 2030. With the development of machine learning and data mining techniques, it might be possible to diagnose CAD inexpensively and easily via examining some physical and biochemical values. In this study, for the CAD classification problem, a novel ensemble feature selection methodology that incorporates domain knowledge is proposed. Via applying the proposed methodology on the UCI Cleveland CAD dataset and using different classification algorithms, performance metrics are compared. It is shown that in our experiments, when Multilayer Perceptron classifier is used with 9 selected features, our proposed solution reached 85.47% accuracy, 82.96% accuracy and 0.839 F-Measure. As a future work, we aim to generate a machine learning model that can quickly diagnose CAD on real-time data in hospitals. © 2021 Elsevier B.V., All rights reserved.