Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 9 of 9
  • Article
    Citation - Scopus: 6
    Network Intrusion Detection Based on Machine Learning Strategies: Performance Comparisons on Imbalanced Wired, Wireless, and Software-Defined Networking (SDN) Network Traffics
    (Turkiye Klinikleri, 2024-07-26) Hacilar, Hilal; Aydin, Zafer; Güngör, Vehbi Çağrı
    The rapid growth of computer networks emphasizes the urgency of addressing security issues. Organizations rely on network intrusion detection systems (NIDSs) to protect sensitive data from unauthorized access and theft. These systems analyze network traffic to detect suspicious activities, such as attempted breaches or cyberattacks. However, existing studies lack a thorough assessment of class imbalances and classification performance for different types of network intrusions: wired, wireless, and software-defined networking (SDN). This research aims to fill this gap by examining these networks’ imbalances, feature selection, and binary classification to enhance intrusion detection system efficiency. Various techniques such as SMOTE, ROS, ADASYN, and SMOTETomek are used to handle imbalanced datasets. Additionally, eXtreme Gradient Boosting (XGBoost) identifies key features, and an autoencoder (AE) assists in feature extraction for the classification task. The study evaluates datasets such as AWID, UNSW, and InSDN, yielding the best results with different numbers of selected features. Bayesian optimization fine-tunes parameters, and diverse machine learning algorithms (SVM, kNN, XGBoost, random forest, ensemble classifiers, and autoencoders) are employed. The optimal results, considering F1-measure, overall accuracy, detection rate, and false alarm rate, have been achieved for the UNSW-NB15, preprocessed AWID, and InSDN datasets, with values of [0.9356, 0.9289, 0.9328, 0.07597], [0.997, 0.9995, 0.9999, 0.0171], and [0.9998, 0.9996, 0.9998, 0.0012], respectively. These findings demonstrate that combining Bayesian optimization with oversampling techniques significantly enhances classification performance across wired, wireless, and SDN networks when compared to previous research conducted on these datasets. © 2024 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 5
    Identifying Taxonomic Biomarkers of Colorectal Cancer in Human Intestinal Microbiota Using Multiple Feature Selection Methods
    (Institute of Electrical and Electronics Engineers Inc., 2022-09-07) Jabeer, Amhar; Kocak, Aysegul; Akkaş, Huseyin; Yenisert, Ferhan; Nalbantoĝlu, Özkan Ufuk; Yousef, Malik; Bakir-Güngör, Burcu; Bakir Gungor, Burcu
    A variety of bacterial species called gut microbiota work together to maintain a steady intestinal environment. The gastrointestinal tract contains tremendous amount of different species including archaea, bacteria, fungi, and viruses. While these organisms are crucial immune system stabilizers, the dysbiosis of the intestinal flora has been related to gastrointestinal disorders including Colorectal cancer (CRC), intestinal cancer, irritable bowel syndrome and inflammatory bowel disease. In the last decade, next-generation sequencing (NGS) methods have accelerated the identification of human gut flora. CRC is a deathly condition that has been on the rise in the last century, affecting half a million people each year. Since early CRC diagnosis is critical for an effective treatment, there is an immediate requirement for a classification system that can expedite CRC diagnosis. In this study, via analyzing the available metagenomics data on CRC, we aim to facilitate the CRC diagnosis via finding biomarkers linked with CRC, and via building a classification model. We have obtained the metagenomic sequencing data of the healthy individuals and CRC patients from a metagenome-wide association analysis and we have classified this data according to the disease stages. Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), Extreme Gradient Boosting (XGBoost), min redundancy max relevance (mRMR), Information Gain (IG) and Select K Best (SKB) feature selection algorithms were utilized to cope with the complexity of the features. We observed that the SKB, IG, and XGBoost techniques made significant contributions to decrease the microbiota in use for CRC diagnosis, thereby reducing cost and time. We realized that our Random Forest classifier outperformed Adaboost, Support Vector Machine, Decision Tree, Logitboost and stacking ensemble classifiers in terms of CRC classification performance. Our results reiterated some known and some potential microbiome associated mechanisms in CRC, which could aid the design of new diagnostics based on the microbiome. © 2022 Elsevier B.V., All rights reserved.
  • Article
    Citation - Scopus: 8
    Building a Challenging Medical Dataset for Comparative Evaluation of Classifier Capabilities
    (Elsevier Ltd, 2024-08) Bozkurt, Berat; Coskun, Kerem; Bakal, Gokhan
    Since the 2000s, digitalization has been a crucial transformation in our lives. Nevertheless, digitalization brings a bulk of unstructured textual data to be processed, including articles, clinical records, web pages, and shared social media posts. As a critical analysis, the classification task classifies the given textual entities into correct categories. Categorizing documents from different domains is straightforward since the instances are unlikely to contain similar contexts. However, document classification in a single domain is more complicated due to sharing the same context. Thus, we aim to classify medical articles about four common cancer types (Leukemia, Non-Hodgkin Lymphoma, Bladder Cancer, and Thyroid Cancer) by constructing machine learning and deep learning models. We used 383,914 medical articles about four common cancer types collected by the PubMed API. To build classification models, we split the dataset into 70% as training, 20% as testing, and 10% as validation. We built widely used machine-learning (Logistic Regression, XGBoost, CatBoost, and Random Forest Classifiers) and modern deep-learning (convolutional neural networks - CNN, long short-term memory - LSTM, and gated recurrent unit - GRU) models. We computed the average classification performances (precision, recall, F-score) to evaluate the models over ten distinct dataset splits. The best-performing deep learning model(s) yielded a superior F1 score of 98%. However, traditional machine learning models also achieved reasonably high F1 scores, 95% for the worst-performing case. Ultimately, we constructed multiple models to classify articles, which compose a hard-to-classify dataset in the medical domain. © 2024 Elsevier B.V., All rights reserved.
  • Article
    Citation - Scopus: 15
    An Effective Colorectal Polyp Classification for Histopathological Images Based on Supervised Contrastive Learning
    (Elsevier Ltd, 2024-04) Yengec-Tasdemir, Sena Busra; Aydin, Zafer; Akay, Ebru; Doǧan, Serkan; Yilmaz, Bulent
    Early detection of colon adenomatous polyps is pivotal in reducing colon cancer risk. In this context, accurately distinguishing between adenomatous polyp subtypes, especially tubular and tubulovillous, from hyperplastic variants is crucial. This study introduces a cutting-edge computer-aided diagnosis system optimized for this task. Our system employs advanced Supervised Contrastive learning to ensure precise classification of colon histopathology images. Significantly, we have integrated the Big Transfer model, which has gained prominence for its exemplary adaptability to visual tasks in medical imaging. Our novel approach discerns between in-class and out-of-class images, thereby elevating its discriminatory power for polyp subtypes. We validated our system using two datasets: a specially curated one and the publicly accessible UniToPatho dataset. The results reveal that our model markedly surpasses traditional deep convolutional neural networks, registering classification accuracies of 87.1% and 70.3% for the custom and UniToPatho datasets, respectively. Such results emphasize the transformative potential of our model in polyp classification endeavors. © 2024 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 19
    A Novel Feature Design and Stacking Approach for Non-Technical Electricity Loss Detection
    (Institute of Electrical and Electronics Engineers Inc., 2018-05) Aydin, Zafer; Güngör, Vehbi Çağrı
    Non-technical electricity losses continue to jeopardize economic and social well-being of many countries. In this work, we develop machine learning classifiers that can identify anomalous electricity consumption in Turkey. Starting from weekly electricity usage data, we develop new features that capture statistical and frequency domain characteristics of the customers and their consumption patterns. We analyze the effect of reducing number of feature descriptors through dimensionality reduction and feature selection techniques. To overcome the class imbalance problem, we implement several ensemble methods and compare their prediction accuracy to those of the standard classifiers. The proposed features and combining strengths of different classifiers bring significant improvements on performance metrics, which is demonstrated through detailed simulations on shopping mall sector. We anticipate that advances in this field will contribute to the economies considerably. © 2018 Elsevier B.V., All rights reserved.
  • Conference Object
    Population Specific Classification of Colorectal Cancer With Meta-Analysis of Metagenomic Data
    (Institute of Electrical and Electronics Engineers Inc., 2023-10-11) Temiz, Mustafa; Yousef, Malik; Bakir-Güngör, Burcu
    Advances in next-generation sequencing and '-omics' technologies makes it possible to characterize the human gut microbiome. While some of these microorganisms are important regulators of our immune system, modulation of the microbiota leads to a variety of diseases. Colorectal cancer (CRC), the third most common cancer worldwide, is caused by genetic mutations, environmental conditions, and abnormalities in the gut microbiota. Using various machine learning methods and meta-analysis techniques, this study aims to build a classification model that can help in CRC diagnosis by analyzing metagenomic datasets of different populations obtained at the species level. Using 8 different countries and 9 different metagenomic datasets, 3 different meta-analyzes are performed: within-population, cross-population, and one population is selected for testing and the rest is used as a training dataset (LODO). For CRC classification, 4 different classification algorithms (Random Forest (RF), Logitboost, Adaboost, and Decision Tree (DT)) are used. The best performance among these methods was obtained with the Random Forest algorithm with an AUC of 0.98 by using JP for the training data set and JPN populations for the test data set in the cross-population performance evaluation. © 2023 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 3
    İki Durumlu Bir Beyin Bilgisayar Arayüzünde Özellik Çıkarımı ve Sınıflandırma
    (Institute of Electrical and Electronics Engineers Inc., 2016-10) Altindis, Fatih; Yilmaz, Bulent
    Brain Computer Interface (BCI) technology is used to help patients who do not have control over motor neurons such as ALS or paralyzed patients, to communicate with outer world. This work aims to classify motor imageries using real-time EEG dataset, which was published by Graz University, Austria. The dataset consists of two-channel EEG signals of right-hand movement imagery and left-hand movement imagery of 8 subjects. There are a total of 120 motor imagery trials (60 left and 60 right) EEG signals recorded from each subject. EEG signals are filtered and feature vectors were extracted that consist of 24, 32 and 40 relative band power values (RBPV). In this work, feature vectors classified by three different methods, linear discriminant analysis (LDA), K nearest neighbor (KNN) and support vector machines (SVM). Results show that best performance was achieved by 24 RBPV feature vector and LDA classification method. © 2017 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 3
    Protein İkincil Yapı Tahmini Için Makine Öǧrenmesi Yöntemlerinin Karşılaştırılması
    (Institute of Electrical and Electronics Engineers Inc., 2018-05) Aydin, Zafer; Kaynar, Oǧuz; Görmez, Yasin; Işik, Yunus Emre
    Three-dimensional structure prediction is one of the important problems in bioinformatics and theoretical chemistry. One of the most important steps in the three-dimensional structure prediction is the estimation of secondary structure. Due to rapidly growing databases and recent feature extraction methods datasets used for predicting secondary structure can potentially contain a large number of samples and dimensions. For this reason, it is important to use algorithms that are fast and accurate. In this study, various classification algorithms have been optimized for the second phase of a two-stage classifier on EVAset benchmark both in the original input space and in the space reduced using the information gain metric. The most accurate classifier is obtained as the support vector machine while the extreme learning machine is significantly faster in model training. © 2018 Elsevier B.V., All rights reserved.
  • Conference Object
    Citation - Scopus: 1
    Koroner Arter Hastalığı Tanısı İçin Alan Bilgisi İçeren Topluluk Öznitelik Seçim Yöntemi
    (Institute of Electrical and Electronics Engineers Inc., 2020-10-05) Kolukisa, Burak; Güngör, Vehbi Çağrı; Bakir-Güngör, Burcu; Gungor, Burcu Bakir
    Coronary Artery Disease (CAD) is the condition where, the heart is not fed enough as a result of the accumulation of fatty matter called atheroma in the walls of the arteries. In 2016, CAD accounts for 31% (17.9 million) of the world's total deaths and its diagnosis is difficult. It is estimated that approximately 23.6 million people will die from this disease in 2030. With the development of machine learning and data mining techniques, it might be possible to diagnose CAD inexpensively and easily via examining some physical and biochemical values. In this study, for the CAD classification problem, a novel ensemble feature selection methodology that incorporates domain knowledge is proposed. Via applying the proposed methodology on the UCI Cleveland CAD dataset and using different classification algorithms, performance metrics are compared. It is shown that in our experiments, when Multilayer Perceptron classifier is used with 9 selected features, our proposed solution reached 85.47% accuracy, 82.96% accuracy and 0.839 F-Measure. As a future work, we aim to generate a machine learning model that can quickly diagnose CAD on real-time data in hospitals. © 2021 Elsevier B.V., All rights reserved.