PubMed İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/397

Browse

Search Results

Now showing 1 - 9 of 9

Predicting Respiratory Infection and Symptoms Development Using Gene Set Enrichment Scores and Machine Learning
(Elsevier Sci Ltd, 2026) Aydin, Zafer; Isik, Yunus Emre
Recent advancements in precision medicine enable personalized predictions grounded in individual-level genetic data. However, relying solely on a single type of data can decrease prediction accuracy and limit the biological interpretability of the resulting models. Incorporating predefined genetic knowledge, such as derived gene sets, can improve performance and provide deeper biological insights for complex diseases, including respiratory infections. This study aimed to evaluate the usability of enrichment scores (ES), calculated using gene sets from the Molecular Signatures Database (MSigDB), as a feature representation for machine learning models to predict respiratory viral infections and symptom development. In addition, the proposed feature representation approach was extensively compared with the de facto gene-level expression representation. A total of 36,834 predefined gene sets were compiled from the MSigDB, and their ES values were calculated. Experiments used the GSE73072 dataset from Gene Expression Omnibus, containing gene expression profiles before and after virus exposure. Various machine learning and feature selection algorithms were applied to ES-based and probe-level feature sets. The results showed that both feature representation approaches achieved an area under the precision-recall curve (AUPRC) value greater than 0.90 for all tasks. Compared with the Respiratory Viral DREAM Challenge leaderboard phase, our models showed a 14.8% improvement in pre-exposure predictions (T0) and a 17.4% improvement in symptom classification. Using enrichment scores as a feature representation generally resulted in better performance than probe-level representation when predicting respiratory infections and symptom development. Identifying key gene sets through feature selection and comparing them with essential genes for respiratory viruses enabled a more comprehensive analysis, providing deeper insights into the pathways that contribute to these predictions.
GraphUnet-SS: A Novel Deep Learning Model for Protein Secondary Structure Prediction Based on U-Net Architecture
(Elsevier Ltd, 2026-04) Aydin, Zafer; Görmez, Yasin; Sabzekar, Mostafa
Citation - WoS: 7
Citation - Scopus: 8
The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behcet's Disease
(IEEE Computer Soc, 2022-05-01) Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu
Behcet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behcet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.
Citation - WoS: 7
Citation - Scopus: 11
Protein Β-Sheet Prediction Using an Efficient Dynamic Programming Algorithm
(Elsevier Sci Ltd, 2017-10) Sabzekar, Mostafa; Naghibzadeh, Mahmoud; Eghdami, Mandie; Aydin, Zafer
Predicting the beta-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in beta-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all beta-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of beta-strands. Additionally, brute force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate beta-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict beta-sheet structures with high number of beta-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art beta-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar. (C) 2017 Elsevier Ltd. All rights reserved.
Citation - WoS: 25
Citation - Scopus: 33
Improved Classification of Colorectal Polyps on Histopathological Images With Ensemble Learning and Stain Normalization
(Elsevier Ireland Ltd, 2023-04) Yengec-Tasdemir, Sena Busra; Aydin, Zafer; Akay, Ebru; Dogan, Serkan; Yilmaz, Bulent
Background and Objective: Early detection of colon adenomatous polyps is critically important because correct detection of it significantly reduces the potential of developing colon cancers in the future. The key challenge in the detection of adenomatous polyps is differentiating it from its visually similar counterpart, non-adenomatous tissues. Currently, it solely depends on the experience of the pathologist. To assist the pathologists, the objective of this work is to provide a novel non-knowledge-based Clinical Decision Support System (CDSS) for improved detection of adenomatous polyps on colon histopathology images. Methods: The domain shift problem arises when the train and test data are coming from different distributions of diverse settings and unequal color levels. This problem, which can be tackled by stain normalization techniques, restricts the machine learning models to attain higher classification accuracies. In this work, the proposed method integrates stain normalization techniques with ensemble of competitively accurate, scalable and robust variants of CNNs, ConvNexts. The improvement is empirically analyzed for five widely employed stain normalization techniques. The classification performance of the proposed method is evaluated on three datasets comprising more than 10k colon histopathology images. Results: The comprehensive experiments demonstrate that the proposed method outperforms the stateof-the-art deep convolutional neural network based models by attaining 95% classification accuracy on the curated dataset, and 91.1% and 90% on EBHI and UniToPatho public datasets, respectively. Conclusions: These results show that the proposed method can accurately classify colon adenomatous polyps on histopathology images. It retains remarkable performance scores even for different datasets coming from different distributions. This indicates that the model has a notable generalization ability. (c) 2023 Elsevier B.V. All rights reserved.
Citation - WoS: 4
Citation - Scopus: 7
IGPRED-Multitask: A Deep Learning Model to Predict Protein Secondary Structure, Torsion Angles and Solvent Accessibility
(IEEE Computer Soc, 2023-03-01) Gormez, Yasin; Aydin, Zafer
Protein secondary structure, solvent accessibility and torsion angle predictions are preliminary steps to predict 3D structure of a protein. Deep learning approaches have achieved significant improvements in predicting various features of protein structure. In this study, IGPRED-Multitask, a deep learning model with multi task learning architecture based on deep inception network, graph convolutional network and a bidirectional long short-term memory is proposed. Moreover, hyper-parameters of the model are fine-tuned using Bayesian optimization, which is faster and more effective than grid search. The same benchmark test data sets as in the OPUS-TASS paper including TEST2016, TEST2018, CASP12, CASP13, CASPFM, HARD68, CAMEO93, CAMEO93_HARD, as well as the train and validation sets, are used for fair comparison with the literature. Statistically significant improvements are observed in secondary structure prediction on 4 datasets, in phi angle prediction on 2 datasets and in psi angel prediction on 3 datasets compared to the state-of-the-art methods. For solvent accessibility prediction, TEST2016 and TEST2018 datasets are used only to assess the performance of the proposed model.
Citation - WoS: 8
Citation - Scopus: 10
Developing Structural Profile Matrices for Protein Secondary Structure and Solvent Accessibility Prediction
(Oxford Univ Press, 2019-04-01) Aydin, Zafer; Azginoglu, Nuh; Bilgin, Halil Ibrahim; Celik, Mete
Motivation: Predicting secondary structure and solvent accessibility of proteins are among the essential steps that preclude more elaborate 3D structure prediction tasks. Incorporating class label information contained in templates with known structures has the potential to improve the accuracy of prediction methods. Building a structural profile matrix is one such technique that provides a distribution for class labels at each amino acid position of the target. Results: In this paper, a new structural profiling technique is proposed that is based on deriving PFAM families and is combined with an existing approach. Cross-validation experiments on two benchmark datasets and at various similarity intervals demonstrate that the proposed profiling strategy performs significantly better than Homolpro, a state-of-the-art method for incorporating template information, as assessed by statistical hypothesis tests.
Citation - Scopus: 15
An Effective Colorectal Polyp Classification for Histopathological Images Based on Supervised Contrastive Learning
(Elsevier Ltd, 2024-04) Yengec-Tasdemir, Sena Busra; Aydin, Zafer; Akay, Ebru; Doǧan, Serkan; Yilmaz, Bulent
Early detection of colon adenomatous polyps is pivotal in reducing colon cancer risk. In this context, accurately distinguishing between adenomatous polyp subtypes, especially tubular and tubulovillous, from hyperplastic variants is crucial. This study introduces a cutting-edge computer-aided diagnosis system optimized for this task. Our system employs advanced Supervised Contrastive learning to ensure precise classification of colon histopathology images. Significantly, we have integrated the Big Transfer model, which has gained prominence for its exemplary adaptability to visual tasks in medical imaging. Our novel approach discerns between in-class and out-of-class images, thereby elevating its discriminatory power for polyp subtypes. We validated our system using two datasets: a specially curated one and the publicly accessible UniToPatho dataset. The results reveal that our model markedly surpasses traditional deep convolutional neural networks, registering classification accuracies of 87.1% and 70.3% for the custom and UniToPatho datasets, respectively. Such results emphasize the transformative potential of our model in polyp classification endeavors. © 2024 Elsevier B.V., All rights reserved.
Citation - WoS: 14
Citation - Scopus: 14
A Continuously Benchmarked and Crowdsourced Challenge for Rapid Development and Evaluation of Models to Predict COVID-19 Diagnosis and Hospitalization
(Amer Medical Assoc, 2021-10-11) Yan, Yao; Schaffter, Thomas; Bergquist, Timothy; Yu, Thomas; Prosser, Justin; Aydin, Zafer; Mooney, Sean
IMPORTANCE Machine learning could be used to predict the likelihood of diagnosis and severity of illness. Lack of COVID-19 patient data has hindered the data science community in developing models to aid in the response to the pandemic. OBJECTIVES To describe the rapid development and evaluation of clinical algorithms to predict COVID-19 diagnosis and hospitalization using patient data by citizen scientists, provide an unbiased assessment of model performance, and benchmark model performance on subgroups. DESIGN, SETTING, AND PARTICIPANTS This diagnostic and prognostic study operated a continuous, crowdsourced challenge using a model-to-data approach to securely enable the use of regularly updated COVID-19 patient data from the University of Washington by participants from May 6 to December 23, 2020. A postchallenge analysis was conducted from December 24, 2020, to April 7, 2021, to assess the generalizability of models on the cumulative data set as well as subgroups stratified by age, sex, race, and time of COVID-19 test. By December 23, 2020, this challenge engaged 482 participants from 90 teams and 7 countries. MAIN OUTCOMES AND MEASURES Machine learning algorithms used patient data and output a score that represented the probability of patients receiving a positive COVID-19 test result or being hospitalized within 21 days after receiving a positive COVID-19 test result. Algorithms were evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) scores. Ensemble models aggregating models from the top challenge teams were developed and evaluated. RESULTS In the analysis using the cumulative data set, the best performance for COVID-19 diagnosis prediction was an AUROC of 0.776 (95% CI, 0.775-0.777) and an AUPRC of 0.297, and for hospitalization prediction, an AUROC of 0.796 (95% CI, 0.794-0.798) and an AUPRC of 0.188. Analysis on top models submitting to the challenge showed consistently better model performance on the female group than the male group. Among all age groups, the best performance was obtained for the 25- to 49-year age group, and the worst performance was obtained for the group aged 17 years or younger. CONCLUSIONS AND RELEVANCE In this diagnostic and prognostic study, models submitted by citizen scientists achieved high performance for the prediction of COVID-19 testing and hospitalization outcomes. Evaluation of challenge models on demographic subgroups and prospective data revealed performance discrepancies, providing insights into the potential bias and limitations in the models.

PubMed İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results