Scopus İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395
Browse
5 results
Search Results
Article Citation - WoS: 4Citation - Scopus: 4Sample Reduction Strategies for Protein Secondary Structure Prediction(MDPI, 2019-10-18) Atasever, Sema; Aydin, Zafer; Erbay, Hasan; Sabzekar, MostafaPredicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward's method provided the best accuracy on test data.Article Citation - Scopus: 25Recursive Cluster Elimination Based Rank Function (SVM-RCE-R) Implemented in KNIME(F1000 Research Ltd, 2021-01-05) Yousef, Malik; Bakir-Güngör, Burcu; Jabeer, Amhar; Göy, Gökhan; Qureshi, Rehman A.; C Showe, Louise; C. Showe, LouiseIn our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. SVM-RCE-R, further enhances the capabilities of SVM-RCE by the addition of a novel user specified ranking function. This ranking function enables the user to stipulate the weights of the accuracy, sensitivity, specificity, f-measure, area under the curve and the precision in the ranking function This flexibility allows the user to select for greater sensitivity or greater specificity as needed for a specific project. The usefulness of SVM-RCE-R is further supported by development of the maTE tool which uses a similar approach to identify MicroRNA (miRNA) targets. We have also now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to applyThe use of SVM-RCE-R in Knime is simple and intuitive and allows researchers to immediately begin their analysis without having to consult an information technology specialist. The input for the Knime implemented tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in SVM-RCE. The results show that the inclusion of the ranking function has a significant impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics. © 2021 Elsevier B.V., All rights reserved.Article Citation - WoS: 1Citation - Scopus: 1PSO Supported Ensemble Algorithm for Bad Data Detection Against Intelligent Hacking Algorithm(Frontiers Media S.A., 2021-07-23) Yavuz, Levent; Soran, Ahmet; Onen, Ahmet; Muyeen, S. M.Power system cybersecurity has recently become important due to cyber-attacks. Due to advanced computer science and machine learning (ML) applications being used by malicious attackers, cybersecurity is becoming crucial to creating sustainable, reliable, efficient, and well-protected cyber-systems. Power system operators are needed to develop sophisticated detection mechanisms. In this study, a novel machine-learning-based detection algorithm that combines the five most popular ML algorithms with Particle Swarm Optimizer (PSO) is developed and tested by using an intelligent hacking algorithm that is specially developed to measure the effectiveness of this study. The hacking algorithm provides three different types of injections: random, continuous random, and slow injections by adaptive manner. This would make detection harder. Results shows that recall values with the proposed algorithm for each different type of attack have been increased.Article Citation - Scopus: 5Hyperplastic and Tubular Polyp Classification Using Machine Learning and Feature Selection(Elsevier B.V., 2024) Doǧan, Refika Sultan; Akay, Ebru; Doǧan, Serkan; Yilmaz, BulentPurpose: The aim of this study is to develop an effective approach for differentiating between hyperplastic and tubular adenoma colon polyps, which is one of the most difficult tasks in colonoscopy procedures. The main research challenge is how to improve the classification of these polyp subtypes applying various focusing levels on the polyp images, data preprocessing approaches, and classification algorithms. Methods: This study employed 202 colonoscopy videos from a total of 201 patients, focusing on 59 videos containing hyperplastic and tubular adenoma polyps. Manually extract key frames and several feature extraction and classification techniques were applied. The influence of different datasets with various focuses as well as data preprocessing steps on the performance of classification was examined, and AUC values were calculated using ten classifiers. Results: The study discovered that the optimal dataset, data preprocessing method, and classification algorithm all had significant effects on classification results. The Random Forest model with the Recursive Feature Elimination (RFE) feature selection approach, for example, consistently outperformed other models and achieved the highest AUC value of 0.9067. In terms of accuracy, F1 score, recall, and AUC, the suggested model outperformed a gastroenterologist, nevertheless precision remained slightly lower. Conclusion: This study emphasizes the importance of dataset selection, data preprocessing, and feature selection in enhancing the classification of difficult colon polyp subtypes. The suggested model offers a promising model for the clinical differentiation of hyperplastic and tubular adenoma polyps, potentially improving diagnostic accuracy in gastroenterology. © 2024 Elsevier B.V., All rights reserved.Article Citation - WoS: 1Citation - Scopus: 13-State Protein Secondary Structure Prediction Based on Scope Classes(Inst Tecnologia Parana, 2021) Atasever, Sema; Azginoglu, Nuh; Erbay, Hasan; Aydin, ZaferImproving the accuracy of protein secondary structure prediction has been an important task in bioinformatics since it is not only the starting point in obtaining tertiary structure in hierarchical modeling but also enhances sequence analysis and sequence-structure threading to help determine structure and function. Herein we present a model based on DSPRED classifier, a hybrid method composed of dynamic Bayesian networks and a support vector machine to predict 3-state secondary structure information of proteins. We used the SCOPe (Structural Classification of Proteins-extended) database to train and test the model. The results show that DSPRED reached a Q(3) accuracy rate of 82.36% when trained and tested using proteins from all SCOPe classes. We compared our method with the popular PSI PRED on the SCOPe test datasets and found that our method outperformed PSI PRED.
