Browsing by Author "Sabzekar, Mostafa"

Now showing 1 - 4 of 4

IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction
(WILEY111 RIVER ST, HOBOKEN 07030-5774, NJ, 2021) Gormez, Yasin; Sabzekar, Mostafa; Aydin, Zafer; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, Zafer
There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.
A noise-aware feature selection approach for classification
(SPRINGERONE NEW YORK PLAZA, SUITE 4600 , NEW YORK, NY 10004, UNITED STATES, 2021) Sabzekar, Mostafa; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, Zafer
A noise-aware version of support vector machines is utilized for feature selection in this paper. Combining this method and sequential backward search (SBS), a new algorithm for removing irrelevant features is proposed. Although feature selection methods in the literature which utilize support vector machines have provided acceptable results, noisy samples and outliers may affect the performance of SVM and feature selections method, consequently. Recently, we have proposed relaxed constrains SVM (RSVM) which handles noisy data and outliers. Each training sample in RSVM is associated with a degree of importance utilizing the fuzzy c-means clustering method. Therefore, a less importance degree is assigned to noisy data and outliers. Moreover, RSVM has more relaxed constraints that can reduce the effect of noisy samples. Feature selection increases the accuracy of different machine learning applications by eliminating noisy and irrelevant features. In the proposed RSVM-SBS feature selection algorithm, noisy data have small effect on eliminating irrelevant features. Experimental results using real-world data verify that RSVM-SBS has better results in comparison with other feature selection approaches utilizing support vector machines.
Protein beta-sheet prediction using an efficient dynamic programming algorithm
(ELSEVIER SCI LTDTHE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, OXON, ENGLAND, 2017) Sabzekar, Mostafa; Naghibzadeh, Mahmoud; Eghdami, Mandie; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
Predicting the beta-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in beta-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all beta-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of beta-strands. Additionally, brute force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate beta-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict beta-sheet structures with high number of beta-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art beta-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar. (C) 2017 Elsevier Ltd. All rights reserved.
Sample Reduction Strategies for Protein Secondary Structure Prediction
(MDPI, ST ALBAN-ANLAGE 66, CH-4052 BASEL, SWITZERLAND, 2019) Atasever, Sema; Aydın, Zafer; Erbay, Hasan; Sabzekar, Mostafa; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü;
Predicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward's method provided the best accuracy on test data. Keywords