Browsing by Author "Gormez, Yasin"

Now showing 1 - 8 of 8

Citation - WoS: 13
Citation - Scopus: 19
A Deep Learning Approach With Bayesian Optimization and Ensemble Classifiers for Detecting Denial of Service Attacks
(Wiley, 2020) Gormez, Yasin; Aydin, Zafer; Karademir, Ramazan; Gungor, Vehbi C.
Detecting malicious behavior is important for preventing security threats in a computer network. Denial of Service (DoS) is among the popular cyber attacks targeted at web sites of high-profile organizations and can potentially have high economic and time costs. In this paper, several machine learning methods including ensemble models and autoencoder-based deep learning classifiers are compared and tuned using Bayesian optimization. The autoencoder framework enables to extract new features by mapping the original input to a new space. The methods are trained and tested both for binary and multi-class classification on Digiturk and Labris datasets, which were introduced recently for detecting various types of DDoS attacks. The best performing methods are found to be ensembles though deep learning classifiers achieved comparable level of accuracy.
Citation - WoS: 2
Citation - Scopus: 3
Multi Fragment Melting Analysis System (MFMAS) for One-Step Identification of Lactobacilli
(Elsevier, 2020) Kesmen, Zulal; Kilic, Ozge; Gormez, Yasin; Celik, Mete; Bakir-Gungor, Burcu
The accurate identification of lactobacilli is essential for the effective management of industrial practices associated with lactobacilli strains, such as the production of fermented foods or probiotic supplements. For this reason, in this study, we proposed the Multi Fragment Melting Analysis System (MFMAS)-lactobacilli based on high resolution melting (HRM) analysis of multiple DNA regions that have high interspecies heterogeneity for fast and reliable identification and characterization of lactobacilli. The MFMAS-lactobacilli is a new and customized version of the MFMAS, which was developed by our research group. MFMAS-lactobacilli is a combined system that consists of i) a ready-to-use plate, which is designed for multiple HRM analysis, and ii) a data analysis software, which is used to characterize lactobacilli species via incorporating machine learning techniques. Simultaneous HRM analysis of multiple DNA fragments yields a fingerprint for each tested strain and the identification is performed by comparing the fingerprints of unknown strains with those of known lactobacilli species registered in the MFMAS. In this study, a total of 254 isolates, which were recovered from fermented foods and probiotic supplements, were subjected to MFMAS analysis, and the results were confirmed by a combination of different molecular techniques. All of the analyzed isolates were exactly differentiated and accurately identified by applying the single-step procedure of MFMAS, and it was determined that all of the tested isolates belonged to 18 different lactobacilli species. The individual analysis of each target DNA region provided identification with an accuracy range from 59% to 90% for all tested isolates. However, when each target DNA region was analyzed simultaneously, perfect discrimination and 100% accurate identification were obtained even in closely related species. As a result, it was concluded that MFMAS-lactobacilli is a multi-purpose method that can be used to differentiate, classify, and identify lactobacilli species. Hence, our proposed system could be a potential alternative to overcome the inconsistencies and difficulties of the current methods.
Citation - WoS: 12
Citation - Scopus: 19
IGPRED: Combination of Convolutional Neural and Graph Convolutional Networks for Protein Secondary Structure Prediction
(Wiley, 2021) Gormez, Yasin; Sabzekar, Mostafa; Aydin, Zafer
There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.
Citation - WoS: 3
Citation - Scopus: 5
IGPRED-Multitask: A Deep Learning Model to Predict Protein Secondary Structure, Torsion Angles and Solvent Accessibility
(IEEE Computer Soc, 2023) Gormez, Yasin; Aydin, Zafer
Protein secondary structure, solvent accessibility and torsion angle predictions are preliminary steps to predict 3D structure of a protein. Deep learning approaches have achieved significant improvements in predicting various features of protein structure. In this study, IGPRED-Multitask, a deep learning model with multi task learning architecture based on deep inception network, graph convolutional network and a bidirectional long short-term memory is proposed. Moreover, hyper-parameters of the model are fine-tuned using Bayesian optimization, which is faster and more effective than grid search. The same benchmark test data sets as in the OPUS-TASS paper including TEST2016, TEST2018, CASP12, CASP13, CASPFM, HARD68, CAMEO93, CAMEO93_HARD, as well as the train and validation sets, are used for fair comparison with the literature. Statistically significant improvements are observed in secondary structure prediction on 4 datasets, in phi angle prediction on 2 datasets and in psi angel prediction on 3 datasets compared to the state-of-the-art methods. For solvent accessibility prediction, TEST2016 and TEST2018 datasets are used only to assess the performance of the proposed model.
Citation - WoS: 6
Citation - Scopus: 7
Dimensionality Reduction for Protein Secondary Structure and Solvent Accesibility Prediction
(World Scientific Publ Co Pte Ltd, 2018) Aydin, Zafer; Kaynar, Oguz; Gormez, Yasin
Secondary structure and solvent accessibility prediction provide valuable information for estimating the three dimensional structure of a protein. As new feature extraction methods are developed the dimensionality of the input feature space increases steadily. Reducing the number of dimensions provides several advantages such as faster model training, faster prediction and noise elimination. In this work, several dimensionality reduction techniques have been employed including various feature selection methods, autoencoders and PCA for protein secondary structure and solvent accessibility prediction. The reduced feature set is used to train a support vector machine at the second stage of a hybrid classifier. Cross-validation experiments on two difficult benchmarks demonstrate that the dimension of the input space can be reduced substantially while maintaining the prediction accuracy. This will enable the incorporation of additional informative features derived for predicting the structural properties of proteins without reducing the accuracy due to overfitting.
Citation - WoS: 1
The Identification of Discriminative Single Nucleotide Polymorphism Sets for the Classification of Behcet's Disease
(IEEE, 2018) Gormez, Yasin; Isik, Yunus Emre; Bakir-Gungor, Burcu
Behcet's disease is a long-term multisystem inflammatory disorder, characterized by recurrent attacks affecting several organs. As the genotyping individuals get cheaper and easier following the developments in genomic technologies, genome-wide association studies (GWAS) emerged. By this means, via studying big-sized case-control groups for a specific disease, potential genetic variations, single nucleotide polymorphisms (SNPs) are identified. Although several genetic risk factors are identified for Behcet's disease with the help of these studies via scanning around a million of SNPs, these variations could only explain up to 200/u of the disease's genetic risk. In this study, for Behcet's disease classification, via comparing all the SNPs genotyped in GWAS, with the SNPs selected via using genetic knowledge, gain ratio and information gain; both reduction in the feature size and improvement in the classification accuracy is aimed. Also, using different classification algorithms such as random forest, k-nearest neighbour and logistic regression, their effects on the classification accuracy are investigated. Our results showed that compared to other feature selection methods, with at least 81% success rate, the selection of the SNPs using the genetic information (of their GWAS p-values, indicating the significance of the SNP against the disease) provides 15% to 42% improvement in all classification algorithms. This improvement is statistically sound. While gain ratio and information gain feature selection techniques yield similar classification accuracies, the models using all SNPs could not exceed 50% accuracies and results in the worst performance.
Citation - WoS: 2
Citation - Scopus: 1
Feature Selection for Protein Dihedral Angle Prediction
(IEEE, 2017) Aydin, Zafer; Kaynar, Oguz; Gormez, Yasin
Three-dimensional structure prediction has crucial importance for bioinformatics and theoretical chemistry. One of the main steps of three-dimensional structure prediction is dihedral (torsion) angle prediction. As new feature extraction methods are developed the dimension of the input space increases considerably yielding longer model training and less accurate models due to noisy or redundant features. In this study, feature selection is employed for dimensionality reduction on one of the established benchmarks of protein 1D structure prediction. Experimental results show that the feature selection improves the accuracy of protein dihedral angle class prediction by 2% and can eliminate up to %82 of the features when random forest classifier is used. Accurate prediction of dihedral angles will eventually contribute to protein structure prediction.
Citation - WoS: 6
Citation - Scopus: 7
The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease
(IEEE Computer Soc, 2022) Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu
Behcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.