Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 3 of 3
  • Article
    Citation - WoS: 4
    Citation - Scopus: 4
    Sample Reduction Strategies for Protein Secondary Structure Prediction
    (MDPI, 2019-10-18) Atasever, Sema; Aydin, Zafer; Erbay, Hasan; Sabzekar, Mostafa
    Predicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward's method provided the best accuracy on test data.
  • Article
    Citation - WoS: 13
    Citation - Scopus: 20
    IGPRED: Combination of Convolutional Neural and Graph Convolutional Networks for Protein Secondary Structure Prediction
    (Wiley, 2021-05-25) Gormez, Yasin; Sabzekar, Mostafa; Aydin, Zafer
    There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.
  • Article
    Citation - WoS: 1
    Citation - Scopus: 1
    3-State Protein Secondary Structure Prediction Based on Scope Classes
    (Inst Tecnologia Parana, 2021) Atasever, Sema; Azginoglu, Nuh; Erbay, Hasan; Aydin, Zafer
    Improving the accuracy of protein secondary structure prediction has been an important task in bioinformatics since it is not only the starting point in obtaining tertiary structure in hierarchical modeling but also enhances sequence analysis and sequence-structure threading to help determine structure and function. Herein we present a model based on DSPRED classifier, a hybrid method composed of dynamic Bayesian networks and a support vector machine to predict 3-state secondary structure information of proteins. We used the SCOPe (Structural Classification of Proteins-extended) database to train and test the model. The results show that DSPRED reached a Q(3) accuracy rate of 82.36% when trained and tested using proteins from all SCOPe classes. We compared our method with the popular PSI PRED on the SCOPe test datasets and found that our method outperformed PSI PRED.