Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 38

Predicting Respiratory Infection and Symptoms Development Using Gene Set Enrichment Scores and Machine Learning
(Elsevier Sci Ltd, 2026) Aydin, Zafer; Isik, Yunus Emre
Recent advancements in precision medicine enable personalized predictions grounded in individual-level genetic data. However, relying solely on a single type of data can decrease prediction accuracy and limit the biological interpretability of the resulting models. Incorporating predefined genetic knowledge, such as derived gene sets, can improve performance and provide deeper biological insights for complex diseases, including respiratory infections. This study aimed to evaluate the usability of enrichment scores (ES), calculated using gene sets from the Molecular Signatures Database (MSigDB), as a feature representation for machine learning models to predict respiratory viral infections and symptom development. In addition, the proposed feature representation approach was extensively compared with the de facto gene-level expression representation. A total of 36,834 predefined gene sets were compiled from the MSigDB, and their ES values were calculated. Experiments used the GSE73072 dataset from Gene Expression Omnibus, containing gene expression profiles before and after virus exposure. Various machine learning and feature selection algorithms were applied to ES-based and probe-level feature sets. The results showed that both feature representation approaches achieved an area under the precision-recall curve (AUPRC) value greater than 0.90 for all tasks. Compared with the Respiratory Viral DREAM Challenge leaderboard phase, our models showed a 14.8% improvement in pre-exposure predictions (T0) and a 17.4% improvement in symptom classification. Using enrichment scores as a feature representation generally resulted in better performance than probe-level representation when predicting respiratory infections and symptom development. Identifying key gene sets through feature selection and comparing them with essential genes for respiratory viruses enabled a more comprehensive analysis, providing deeper insights into the pathways that contribute to these predictions.
Frequency-Based Deep Occlusion Awareness Instance Segmentation
(MDPI, 2026-02-26) Guzel, Yasin; Aydin, Zafer; Talu, Muhammed Fatih
One major challenge faced by deep learning-based methods that detect target objects in the form of bounding boxes is object occlusion. High degrees of occlusion significantly diminish the accuracy of instance segmentation. Nonetheless, complex-valued Fourier descriptors can robustly represent object boundaries using minimal information. In this study, the impact of integrating Fourier descriptors-renowned for their strong representational capacity-with deep network models (UNet) that exhibit high generalization performance on instance segmentation accuracy was investigated. Within the scope of the research, nine network models were designed based on different strategies for utilizing frequency components. These variants fall into four strategy families: (i) UNet-style spectrum regression on fixed low-frequency windows (FUNet), (ii) magnitude-guided frequency selection/ROI construction (FUNet-Thr, FUNet-BBox), (iii) sequence models over tokenized FFT coefficients (BiLSTM Patch/Sorted), and (iv) encoder-only spectrum predictors with different depth/capacity (EncoderFFT1/2). To fairly evaluate the models' performance in segmenting objects subjected to disruptive factors (e.g., occlusion, blurring, noise), a specialized synthetic dataset was prepared. The task is formulated as single-target (single-instance), single-class segmentation. This dataset, automatically generated according to initial parameter values, contains images of objects moving at various speeds within a single frame. Among these models, the one termed FUNet, which relies on partial matching of central frequency components, achieved the highest segmentation accuracy despite the disruptive effects. Under the challenging Dataset 8 setting, the proposed FUNet achieved the highest overlap-based performance (Dice = 0.9329, IoU = 0.8842) among Attention U-Net, U-Net, and FourierNet, with statistically significant gains confirmed by paired per-image tests.
Citation - WoS: 7
Citation - Scopus: 8
The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease
(IEEE Computer Soc, 2022-05-01) Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu
Behcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.
Citation - WoS: 3
Citation - Scopus: 3
Template Scoring Methods for Protein Torsion Angle Prediction
(Springer-Verlag Berlin, 2015) Aydin, Zafer; Baker, David; Noble, William Stafford
Prediction of backbone torsion angles provides important constraints about the 3D structure of a protein and is receiving a growing interest in the structure prediction community. In this paper, we introduce a three-stage machine learning classifier to predict the 7-state torsion angles of a protein. The first two stages employ dynamic Bayesian and neural networks to produce an ab-initio prediction of torsion angle states starting from sequence profiles. The third stage is a committee classifier, which combines the ab-initio prediction with a structural frequency profile derived from templates obtained by HHsearch. We develop several structural profile models and obtain significant improvements over the Laplacian scoring technique through: (1) scaling templates by integer powers of sequence identity score, (2) incorporating other alignment scores as multiplicative factors (3) adjusting or optimizing parameters of the profile models with respect to the similarity interval of the target. We also demonstrate that the torsion angle prediction accuracy improves at all levels of target-template similarity even when templates are distant from the target. The improvement is at significantly higher rates as template structures gradually get closer to target.
Citation - WoS: 2
Citation - Scopus: 2
Structural Profile Matrices for Predicting Structural Properties of Proteins
(World Scientific Publ Co Pte Ltd, 2020-07-10) Azginoglu, Nuh; Aydin, Zafer; Celik, Mete
Predicting structural properties of proteins plays a key role in predicting the 3D structure of proteins. In this study, new structural profile matrices (SPM) are developed for protein secondary structure, solvent accessibility and torsion angle class predictions, which could be used as input to 3D prediction algorithms. The structural templates employed in computing SPMs are detected by eight alignment methods in LOMETS server, gap affine alignment method, ScanProsite, PfamScan, and HHblits. The contribution of each template is weighted by its similarity to target, which is assessed by several sequence alignment scores. For comparison, the SPMs are also computed using Homolpro, which uses BLAST for target template alignments and does not assign weights to templates. Incorporating the SPMs into DSPRED classifier, the prediction accuracy improves significantly as demonstrated by cross-validation experiments on two difficult benchmarks. The most accurate predictions are obtained using the SPMs derived by threading methods in LOMETS server. On the other hand, the computational cost of computing these SPMs was the highest.
Citation - WoS: 1
Citation - Scopus: 8
Short Term Electricity Load Forecasting: A Case Study of Electric Utility Market in Turkey
(Institute of Electrical and Electronics Engineers Inc., 2015-04) Ishik, Muhammed Yasin; Göze, Tolga; Ozcan, Ihsan; Güngör, Vehbi Çağrı; Aydin, Zafer; Yasin, Muhammed
With the recent developments in energy sector, the pricing of electricity is now governed by the spot market where a variety of market mechanisms are effective. After the new legislation of market liberalization in Turkey, competition-based on hourly price has received a growing interest in the energy market, which necessitated generators and electric utility companies to add new dimensions to their scope of operation: short-term load and price forecasting. The field has several opportunities though not free from challenges. The dynamic behavior of the market price has caused the electric load to become variable and non-stationary. Furthermore, the number of nodes, in which the load must be predicted, is not constant anymore and can no longer be estimated by experts alone. In this competitive scenario, statistical forecasting methods that can automatically and accurately process thousands of data samples are essential. The purpose of this study is to demonstrate the importance of short-term load forecasting, how it has received a growing interest in Turkey and to propose an artificial neural network that can forecast the short term electricity load. Through detailed performance evaluations, we demonstrate that our forecasting method is capable of predicting the hourly load accurately. © 2017 Elsevier B.V., All rights reserved.
Citation - WoS: 5
Citation - Scopus: 5
Sample Reduction Strategies for Protein Secondary Structure Prediction
(MDPI, 2019-10-18) Atasever, Sema; Aydin, Zafer; Erbay, Hasan; Sabzekar, Mostafa
Predicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward's method provided the best accuracy on test data.
Citation - WoS: 11
Citation - Scopus: 20
ROI Detection in Mammogram Images Using Wavelet-Based Haralick and Hog Features
(IEEE, 2018-12) Tasdemir, Sena Busra Yengec; Tasdemir, Kasim; Aydin, Zafer; Yengec Tasdemir, Sena Busra
Digital mammography is a widespread medical imaging technique that is used for early detection and diagnosis of breast cancer. Detecting the region of interest (ROI) helps to locate the abnormal areas, which may be analyzed further by a radiologist or a CAD system. In this paper, a new classification method is proposed for ROI detection in mammography images. Features are extracted using Wavelet transform, Haralick and HOG descriptors. To reduce the number of dimensions and eliminate irrelevant features, a wrapper-based feature selection method is implemented. Several feature extraction methods and machine learning classifiers are compared by performing a leave-one-image-out cross-validation experiment on a difficult dataset. The proposed feature extraction method provides the best accuracy of 87.5% and the second-best area under curve (AUC) score of 84% when employed in a random forest classifier.
Citation - WoS: 7
Citation - Scopus: 11
Protein Β-Sheet Prediction Using an Efficient Dynamic Programming Algorithm
(Elsevier Sci Ltd, 2017-10) Sabzekar, Mostafa; Naghibzadeh, Mahmoud; Eghdami, Mandie; Aydin, Zafer
Predicting the beta-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in beta-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all beta-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of beta-strands. Additionally, brute force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate beta-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict beta-sheet structures with high number of beta-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art beta-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar. (C) 2017 Elsevier Ltd. All rights reserved.
Citation - WoS: 2
Citation - Scopus: 5
Open Source Slurm Computer Cluster System Design and a Sample Application
(Institute of Electrical and Electronics Engineers Inc., 2017-10) Azgınoglu, Nuh; Atasever, Mehmet Umut; Aydin, Zafer; Celik, Mete; Erbay, Hasan
Cluster computing combines the resources of multiple computers as they act like a single high-performance computer. In this study, a computer cluster consisting of Lustre distributed file system with one cluster server based on Slurm resource management system and thirteen calculation nodes were built by using available and inert computers that have different processors. Different bioinformatics algorithms were run using different data sets in the cluster, and the performance of the clusters was evaluated with the amount of time the computing cluster spent to finish the jobs. © 2018 Elsevier B.V., All rights reserved.

Scopus İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results