Browsing by Author "Gormez, Yasin"
Now showing 1 - 12 of 12
- Results Per Page
- Sort Options
Other Comparison of Machine Learning Classifiers for Protein Secondary Structure Prediction(IEEE, 2018) Aydin, Zafer; Kaynar, Oguz; Gormez, Yasin; Isik, Yunus Emre; 0000-0001-7686-6298; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, ZaferProteinlerin üç boyutlu yapılarının tahmin edilmesi teorik kimya ve biyoenformatik için önemli problemlerden biridir. Protein yapı tahmininin en önemli aşamalarından biri ise ikincil yapı tahminidir. Protein veritabanlarındaki verilerin hızlı artışı ve yakın zamanda geliştirilen farklı öznitelik çıkarma yöntemleri neticesinde ikincil yapı tahmini için kullanılan veri setleri boyut ve örnek sayısı bakımından büyümektedir. Bu nedenle hızlı çalışan ve belirli bir doğruluk oranını sahip tahmin algoritmaların kullanılması önem kazanmaktadır. Bu çalışmada iki aşamalı hibrit bir sınıflandırıcının ikinci aşaması için çeşitli sınıflama algoritmaları, EVAset veri seti kullanılarak hem orijinal boyutlu uzayda hem de bilgi kazancı metriği ile boyutu düşürülen uzayda optimize edilmiştir. Elde edilen sonuçlar doğrultusunda en başarılı tahmin yöntemi destek vektör makinası olurken model eğitme süresi bakımından en hızlı yöntem aşırı öğrenme makinası olarak elde edilmiştir.Other Comparison of NR and UniClust Databases for Protein Secondary Structure Prediction(IEEE, 2018) Aydin, Zafer; Kaynar, Oguz; Gormez, Yasin; 0000-0001-7686-6298; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, ZaferProteinlerin üç boyutlu yapılarının tahmin edilmesi teorik kimya ve biyoenformatik için önemli problemlerden biridir. Üç boyutlu yapı tahminin en önemli aşamalarından biri ise ikincil yapı tahminidir. İkincil yapı tahmininde başarı oranının artırılması kullanılan sınıflama algoritması kadar, hesaplanan özniteliklere de bağlı olmaktadır. Öznitelik çıkarmak için sıkça kullanılan çoklu hizalama yöntemlerinde ise hesaplanan değerler, hizalama için kullanılan veri tabanına göre farklılık göstermektedir. Bu nedenle öznitelik matrisleri oluşturulurken uygun veri tabanın seçilmesi önem kazanmaktadır. Bu çalışmada CB513 veri seti kullanılarak iki farklı hizalama yöntemi ve üç farklı veri tabanı yardımı ile 5 farklı veri seti oluşturulmuş ve bu veri setleri iki aşamalı hibrit bir sınıflandırıcı kullanılarak karşılaştırılmıştır. Elde edilen sonuçlar doğrultusunda en iyi başarı oranı HHBlits hizalama yönteminin ilk aşamasında hesaplanacak PSSM değerleri için UniClust ve yapısal profil matrisleri için yine HHBlits’in ilk aşamasında NR veri tabanı kullanıldığında elde edilmiştir.Article Crowdsourcing digital health measures to predict Parkinson's disease severity: the Parkinson's Disease Digital Biomarker DREAM Challenge(NATURE RESEARCHHEIDELBERGER PLATZ 3, BERLIN 14197, GERMANY, 2021) Aydin, Zafer; Sieberts, Solveig K.; Schaff, Jennifer; Duda, Marlena; Pataki, Balint Armin; Sun, Ming; Snyder, Phil; Daneault, Jean-Francois; Parisi, Federico; Costante, Gianluca; Rubin, Udi; Banda, Peter; Chae, Yooree; Chaibub Neto, Elias; Dorsey, E. Ray; Chen, Aipeng; Elo, Laura L.; Espino, Carlos; Glaab, Enrico; Goan, Ethan; Golabchi, Fatemeh Noushin; Gormez, Yasin; Jaakkola, Maria K.; Jonnagaddala, Jitendra; Klen, Riku; Li, Dongmei; McDaniel, Christian; Perrin, Dimitri; Perumal, Thanneer M.; Rad, Nastaran Mohammadian; Rainaldi, Erin; Sapienza, Stefano; Schwab, Patrick; Shokhirev, Nikolai; Venalainen, Mikko S.; Vergara-Diaz, Gloria; Zhang, Yuqian; Wang, Yuanjia; Guan, Yuanfang; Brunner, Daniela; Bonato, Paolo; Mangravite, Lara M.; Omberg, Larsson; AGÜ, Mühendislik Fakültesi, Elektrik - Elektronik Mühendisliği Bölümü; Aydin, ZaferConsumer wearables and sensors are a rich source of data about patients' daily disease and symptom burden, particularly in the case of movement disorders like Parkinson's disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).Article A deep learning approach with Bayesian optimization and ensemble classifiers for detecting denial of service attacks(WILEY, 111 RIVER ST, HOBOKEN 07030-5774, NJ USA, 2020) Gormez, Yasin; Aydin, Zafer; Karademir, Ramazan; Gungor, Vehbi C.; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği BölümüDetecting malicious behavior is important for preventing security threats in a computer network. Denial of Service (DoS) is among the popular cyber attacks targeted at web sites of high-profile organizations and can potentially have high economic and time costs. In this paper, several machine learning methods including ensemble models and autoencoder-based deep learning classifiers are compared and tuned using Bayesian optimization. The autoencoder framework enables to extract new features by mapping the original input to a new space. The methods are trained and tested both for binary and multi-class classification on Digiturk and Labris datasets, which were introduced recently for detecting various types of DDoS attacks. The best performing methods are found to be ensembles though deep learning classifiers achieved comparable level of accuracy.Article The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behçet's Disease(Institute of Electrical and Electronics Engineers Inc., 2021) Isik, Yunus EMRE; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, Zafer; Burcu, Bakir-Gungor,Behçet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behçet's disease. By combining biological information and machine learning classifiers, up to 99.64% accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments. IEEEArticle Dimensionality reduction for protein secondary structure and solvent accesibility prediction(IMPERIAL COLLEGE PRESS, 57 SHELTON ST, COVENT GARDEN, LONDON WC2H 9HE, ENGLAND, 2018) Aydin, Zafer; Kaynar, Oguz; Gormez, Yasin; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği BölümüSecondary structure and solvent accessibility prediction provide valuable information for estimating the three dimensional structure of a protein. As new feature extraction methods are developed the dimensionality of the input feature space increases steadily. Reducing the number of dimensions provides several advantages such as faster model training, faster prediction and noise elimination. In this work, several dimensionality reduction techniques have been employed including various feature selection methods, autoencoders and PCA for protein secondary structure and solvent accessibility prediction. The reduced feature set is used to train a support vector machine at the second stage of a hybrid classifier. Cross-validation experiments on two difficult benchmarks demonstrate that the dimension of the input space can be reduced substantially while maintaining the prediction accuracy. This will enable the incorporation of additional informative features derived for predicting the structural properties of proteins without reducing the accuracy due to overfitting.conferenceobject.listelement.badge Feature Selection for Protein Dihedral Angle Prediction(IEEE345 E 47TH ST, NEW YORK, NY 10017 USA, 2017) Aydin, Zafer; Kaynar, Oguz; Gormez, Yasin; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği BölümüThree-dimensional structure prediction has crucial importance for bioinformatics and theoretical chemistry. One of the main steps of three-dimensional structure prediction is dihedral (torsion) angle prediction. As new feature extraction methods are developed the dimension of the input space increases considerably yielding longer model training and less accurate models due to noisy or redundant features. In this study, feature selection is employed for dimensionality reduction on one of the established benchmarks of protein 1D structure prediction. Experimental results show that the feature selection improves the accuracy of protein dihedral angle class prediction by 2% and can eliminate up to %82 of the features when random forest classifier is used. Accurate prediction of dihedral angles will eventually contribute to protein structure prediction.conferenceobject.listelement.badge The Identification of Discriminative Single Nucleotide Polymorphism Sets for the Classification of Behcet's Disease(IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2018) Gormez, Yasin; Isik, Yunus Emre; Bakir-Gungor, Burcu; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği BölümüBehcet's disease is a long-term multisystem inflammatory disorder, characterized by recurrent attacks affecting several organs. As the genotyping individuals get cheaper and easier following the developments in genomic technologies, genome-wide association studies (GWAS) emerged. By this means, via studying big-sized case-control groups for a specific disease, potential genetic variations, single nucleotide polymorphisms (SNPs) are identified. Although several genetic risk factors are identified for Behcet's disease with the help of these studies via scanning around a million of SNPs, these variations could only explain up to 200/u of the disease's genetic risk. In this study, for Behcet's disease classification, via comparing all the SNPs genotyped in GWAS, with the SNPs selected via using genetic knowledge, gain ratio and information gain; both reduction in the feature size and improvement in the classification accuracy is aimed. Also, using different classification algorithms such as random forest, k-nearest neighbour and logistic regression, their effects on the classification accuracy are investigated. Our results showed that compared to other feature selection methods, with at least 81% success rate, the selection of the SNPs using the genetic information (of their GWAS p-values, indicating the significance of the SNP against the disease) provides 15% to 42% improvement in all classification algorithms. This improvement is statistically sound. While gain ratio and information gain feature selection techniques yield similar classification accuracies, the models using all SNPs could not exceed 50% accuracies and results in the worst performance.Article IGPRED-MultiTask: A Deep Learning Model to Predict Protein Secondary Structure, Torsion Angles and Solvent Accessibility(IEEE COMPUTER SOC, 2023) Gormez, Yasin; Aydin, Zafer; 0000-0001-7686-6298; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, ZaferProtein secondary structure, solvent accessibility and torsion angle predictions are preliminary steps to predict 3D structure of a protein. Deep learning approaches have achieved significant improvements in predicting various features of protein structure. In this study, IGPRED-Multitask, a deep learning model with multi task learning architecture based on deep inception network, graph convolutional network and a bidirectional long short-term memory is proposed. Moreover, hyper-parameters of the model are fine-tuned using Bayesian optimization, which is faster and more effective than grid search. The same benchmark test data sets as in the OPUS-TASS paper including TEST2016, TEST2018, CASP12, CASP13, CASPFM, HARD68, CAMEO93, CAMEO93_HARD, as well as the train and validation sets, are used for fair comparison with the literature. Statistically significant improvements are observed in secondary structure prediction on 4 datasets, in phi angle prediction on 2 datasets and in psi angel prediction on 3 datasets compared to the state-of-the-art methods. For solvent accessibility prediction, TEST2016 and TEST2018 datasets are used only to assess the performance of the proposed model.Article IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction(WILEY111 RIVER ST, HOBOKEN 07030-5774, NJ, 2021) Gormez, Yasin; Sabzekar, Mostafa; Aydin, Zafer; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, ZaferThere is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.Article Multi fragment melting analysis system (MFMAS) for one-step identification of lactobacilli(ELSEVIER, RADARWEG 29, 1043 NX AMSTERDAM, NETHERLANDS, 2020) Kesmen, Zulal; Kilic, Ozge; Gormez, Yasin; Celik, Mete; Bakir-Gungor, Burcu; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği BölümüThe accurate identification of lactobacilli is essential for the effective management of industrial practices associated with lactobacilli strains, such as the production of fermented foods or probiotic supplements. For this reason, in this study, we proposed the Multi Fragment Melting Analysis System (MFMAS)-lactobacilli based on high resolution melting (HRM) analysis of multiple DNA regions that have high interspecies heterogeneity for fast and reliable identification and characterization of lactobacilli. The MFMAS-lactobacilli is a new and customized version of the MFMAS, which was developed by our research group. MFMAS-lactobacilli is a combined system that consists of i) a ready-to-use plate, which is designed for multiple HRM analysis, and ii) a data analysis software, which is used to characterize lactobacilli species via incorporating machine learning techniques. Simultaneous HRM analysis of multiple DNA fragments yields a fingerprint for each tested strain and the identification is performed by comparing the fingerprints of unknown strains with those of known lactobacilli species registered in the MFMAS. In this study, a total of 254 isolates, which were recovered from fermented foods and probiotic supplements, were subjected to MFMAS analysis, and the results were confirmed by a combination of different molecular techniques. All of the analyzed isolates were exactly differentiated and accurately identified by applying the single-step procedure of MFMAS, and it was determined that all of the tested isolates belonged to 18 different lactobacilli species. The individual analysis of each target DNA region provided identification with an accuracy range from 59% to 90% for all tested isolates. However, when each target DNA region was analyzed simultaneously, perfect discrimination and 100% accurate identification were obtained even in closely related species. As a result, it was concluded that MFMAS-lactobacilli is a multi-purpose method that can be used to differentiate, classify, and identify lactobacilli species. Hence, our proposed system could be a potential alternative to overcome the inconsistencies and difficulties of the current methods.conferenceobject.listelement.badge NSEM: Novel Stacked Ensemble Method for Sentiment Analysis(IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2018) Emre Isik, Yunus; Gormez, Yasin; Kaynar, Oguz; Aydin, Zafer; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği BölümüToday, people often share their ideas, opinions and feelings through forums, social media sites, blogs and similar platforms. For this reason, access to these data has become very easy. Increase in the number of shares makes it possible to analyze and use these data in terms of marketing and politics. However, due to the large number of data, it is impossible that this analysis will be done by humans. Determination of what type of emotion is included automatically is done by sentiment analysis methods. In these methods, the text is defined as a mathematical vector and classified by machine learning methods. Ensemble methods are one of the most important methods used as classifiers in sentiment analysis. In these methods, a classifier error is tried to be solved by another classifier. In sentiment analysis, the feature vector that describes the text is as important as the classifier. Feature vectors obtained using different methods can make mistakes in different places. For this reason, in this study, NSEM is proposed for sentiment analysis, which is a new ensemble method that uses 2 different classifiers and 2 different feature extraction methods. As a result of the analysis, the proposed method is the most successful method with an accuracy rate of 79.1%.