WoS İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/394
Browse
10 results
Search Results
Article Topological Feature Generation for Link Prediction in Biological Networks(PeerJ Inc, 2023-05-09) Temiz, Mustafa; Bakir-Gungor, Burcu; Sahan, Pinar Guner; Coskun, Mustafa; Güner Şahan, PınarGraph or network embedding is a powerful method for extracting missing or potential information from interactions between nodes in biological networks. Graph embedding methods learn representations of nodes and interactions in a graph with low-dimensional vectors, which facilitates research to predict potential interactions in networks. However, most graph embedding methods suffer from high computational costs in the form of high computational complexity of the embedding methods and learning times of the classifier, as well as the high dimensionality of complex biological networks. To address these challenges, in this study, we use the Chopper algorithm as an alternative approach to graph embedding, which accelerates the iterative processes and thus reduces the running time of the iterative algorithms for three different (nervous system, blood, heart) undirected protein-protein interaction (PPI) networks. Due to the high dimensionality of the matrix obtained after the embedding process, the data are transformed into a smaller representation by applying feature regularization techniques. We evaluated the performance of the proposed method by comparing it with state-of-the-art methods. Extensive experiments demonstrate that the proposed approach reduces the learning time of the classifier and performs better in link prediction. We have also shown that the proposed embedding method is faster than state-of-the-art methods on three different PPI datasets.Article Citation - WoS: 48Citation - Scopus: 65Review of Feature Selection Approaches Based on Grouping of Features(PeerJ Inc, 2023-07-17) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Bulut, Nurten; Qaqish, Bahjat; Yousef, MalikWith the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly -ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.Article Citation - WoS: 2Citation - Scopus: 4RCE-IFE: Recursive Cluster Elimination With Intra-Cluster Feature Elimination(PeerJ Inc, 2025-02-07) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Qaqish, Bahjat; Yousef, MalikThe computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniques that seek to detect strong correlations among features and identify irrelevant features. In this work, we propose the Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE) method that utilizes feature grouping and iterates grouping and elimination steps in a supervised context. We assess dimensionality reduction and discriminatory capabilities of RCE-IFE on various high-dimensional datasets from different biological domains. For a set of gene expression, MicroRNA (miRNA) expression, and methylation datasets, the performance of RCE-IFE is comparatively evaluated with RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE. On average, RCE-IFE attains an area under the curve (AUC) of 0.85 among tested expression datasets with the fewest features and the shortest running time, while RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE achieve similar AUCs of 0.84 and 0.83, respectively. RCE-IFE and SVM-RCE yield AUCs of 0.79 and 0.68, respectively when averaged over seven different metagenomics datasets, with RCE-IFE significantly reducing feature subsets. Furthermore, RCE-IFE surpasses several state-of-the-art FS methods, such as Minimum Redundancy Maximum Relevance (MRMR), Fast Correlation-Based Filter (FCBF), Information Gain (IG), Conditional Mutual Information Maximization (CMIM), SelectKBest (SKB), and eXtreme Gradient Boosting (XGBoost), obtaining an average AUC of 0.76 on five gene expression datasets. Compared with a similar tool, Multi-stage, RCE-IFE gives a similar average accuracy rate of 89.27% using fewer features on four cancer-related datasets. The comparability of RCE-IFE is also verified with other biological domain knowledge-based Grouping-Scoring-Modeling (G-S-M) tools, including mirGediNET, 3Mint, and miRcorrNet. Additionally, the biological relevance of the selected features by RCE-IFE is evaluated. The proposed method also exhibits high consistency in terms of the selected features across multiple runs. Our experimental findings imply that RCE-IFE provides robust classifier performance and significantly reduces feature size while maintaining feature relevance and consistency.Article Citation - WoS: 4Citation - Scopus: 7Network Anomaly Detection Using Deep Autoencoder and Parallel Artificial Bee Colony Algorithm-Trained Neural Network(PeerJ Inc, 2024-10-08) Hacilar, Hilal; Dedeturk, Bilge Kagan; Bakir-Gungor, Burcu; Gungor, Vehbi CagriCyberattacks are increasingly becoming more complex, which makes intrusion detection extremely difficult. Several intrusion detection approaches have been developed in the literature and utilized to tackle computer security intrusions. Implementing machine learning and deep learning models for network intrusion detection has been a topic of active research in cybersecurity. In this study, artificial neural networks (ANNs), a type of machine learning algorithm, are employed to determine optimal network weight sets during the training phase. Conventional training algorithms, such as back- propagation, may encounter challenges in optimization due to being entrapped within local minima during the iterative optimization process; global search strategies can be slow at locating global minima, and they may suffer from a low detection rate. In the ANN training, the Artificial Bee Colony (ABC) algorithm enables the avoidance of local minimum solutions by conducting a high-performance search in the solution space but it needs some modifications. To address these challenges, this work suggests a Deep Autoencoder (DAE)-based, vectorized, and parallelized ABC algorithm for training feed-forward artificial neural networks, which is tested on the UNSW-NB15 and NF-UNSW-NB15-v2 datasets. Our experimental results demonstrate that the proposed DAE-based parallel ABC-ANN outperforms existing metaheuristics, showing notable improvements in network intrusion detection. The experimental results reveal a notable improvement in network intrusion detection through this proposed approach, exhibiting an increase in detection rate (DR) by 0.76 to 0.81 and a reduction in false alarm rate (FAR) by 0.016 to 0.005 compared to the ANN-BP algorithm on the UNSWNB15 dataset. Furthermore, there is a reduction in FAR by 0.006 to 0.0003 compared to the ANN-BP algorithm on the NF-UNSW-NB15-v2 dataset. These findings underscore the effectiveness of our proposed approach in enhancing network security against network intrusions.Article Citation - WoS: 35Citation - Scopus: 42Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Different Feature Selection Methods(PeerJ Inc, 2022-04-25) Bakir-Gungor, Burcu; Lar, Hilal Hac; Jabeer, Amhar; Nalbantoglu, Ozkan Ufuk; Aran, Oya; Yousef, Malik; Hacilar, HilalThe tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.Article Citation - WoS: 152Citation - Scopus: 157Computational Analysis of MicroRNA-Mediated Interactions in SARS-CoV Infection(PeerJ Inc, 2020-06-05) Demirci, Muserref Duygu Sacar; Adan, AysunMicroRNAs (miRNAs) are post-transcriptional regulators of gene expression found in more than 200 diverse organisms. Although it is still not fully established if RNA viruses could generate miRNAs, there are examples of miRNA like sequences from RNA viruses with regulatory functions. In the case of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there are several mechanisms that would make miRNAs impact the virus, like interfering with viral replication, translation and even modulating the host expression. In this study, we performed a machine learning based miRNA prediction analysis for the SARS-CoV-2 genome to identify miRNA-like hairpins and searched for potential miRNA-based interactions between the viral miRNAs and human genes and human miRNAs and viral genes. Overall, 950 hairpin structured sequences were extracted from the virus genome and based on the prediction results, 29 of them could be precursor miRNAs. Targeting analysis showed that 30 viral mature miRNA-like sequences could target 1,367 different human genes. PANTHER gene function analysis results indicated that viral derived miRNA candidates could target various human genes involved in crucial cellular processes including transcription, metabolism, defense system and several signaling pathways such as Wnt and EGFR signalings. Protein class-based grouping of targeted human genes showed that host transcription might be one of the main targets of the virus since 96 genes involved in transcriptional processes were potential targets of predicted viral miRNAs. For instance, basal transcription machinery elements including several components of human mediator complex (MED1, MED9, MED 12L, MED 19), basal transcription factors such as TAF4, TAF5, TAF7L and site-specific transcription factors such as STATI were found to be targeted. In addition, many known human miRNAs appeared to be able to target viral genes involved in viral life cycle such as S, M, N, E proteins and ORF lab, ORF3a, ORF8, ORF7a and ORF10. Considering the fact that miRNA-based therapies have been paid attention, based on the findings of this study, comprehending mode of actions of miRNAs and their possible roles during SARS-CoV-2 infections could create new opportunities for the development and improvement of new therapeutics.Article Citation - WoS: 4Citation - Scopus: 5Comparative Analysis of Machine Learning Approaches for Predicting Respiratory Virus Infection and Symptom Severity(PeerJ Inc, 2023-06-30) Isik, Yunus Emre; Aydin, ZaferRespiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the 'adaptive immune system' and 'immune disease' are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms.Article Citation - WoS: 4Citation - Scopus: 5CSA-DE-LR: Enhancing Cardiovascular Disease Diagnosis With a Novel Hybrid Machine Learning Approach(PeerJ Inc, 2024-07-18) Dedeturk, Beyhan Adanur; Dedeturk, Bilge Kagan; Bakir-Gungor, BurcuCardiovascular diseases (CVD) are a leading cause of mortality globally, necessitating the development of efficient diagnostic tools. Machine learning (ML) and metaheuristic algorithms have become prevalent in addressing these challenges, providing promising solutions in medical diagnostics. However, traditional ML approaches often need to be improved in feature selection and optimization, leading to suboptimal performance in complex diagnostic tasks. To overcome these limitations, this study introduces a new hybrid method called CSA-DE-LR, which combines the clonal selection algorithm (CSA) and differential evolution (DE) with logistic regression. This integration is designed to optimize logistic regression weights efficiently for the accurate classification of CVD. The methodology employs three optimization strategies based on the F1 score, the Matthews correlation coefficient (MCC), and the mean absolute error (MAE). Extensive evaluations on benchmark datasets, namely Cleveland and Statlog, reveal that CSA-DELR outperforms state-of-the-art ML methods. In addition, generalization is evaluated using the Breast Cancer Wisconsin Original (WBCO) and Breast Cancer Wisconsin Diagnostic (WBCD) datasets. Significantly, the proposed model demonstrates superior efficacy compared to previous research studies in this domain. This study's findings highlight the potential of hybrid machine learning approaches for improving diagnostic accuracy, offering a significant advancement in the fields of medical data analysis and CVD diagnosis.Article Citation - WoS: 18Citation - Scopus: 29Blockchain for Genomics and Healthcare: A Literature Review, Current Status, Classification and Open Issues(PeerJ Inc, 2021-09-30) Dedeturk, Beyhan Adanur; Soran, Ahmet; Bakir-Gungor, BurcuThe tremendous boost in the next generation sequencing technologies and in the "omics"technologies resulted in the generation of hundreds of gigabytes of data per day. Nowadays, via integrating -omics data with other data types, such as imaging and electronic health record (EHR) data, panomics studies attempt to identify novel and potentially actionable biomarkers for personalized medicine applications. In this respect, for the accurate analysis of -omics data and EHR, there is a need to establish secure and robust pipelines that take the ethical aspects into consideration, regulate privacy and ownership issues, and data sharing. These days, blockchain technology has picked up significant attention in diverse fields, including genomics, since it offers a new solution for these problems from a different perspective. Blockchain is an immutable transaction ledger, which offers secure and distributed system without a central authority. Within the system, each transaction can be expressed with cryptographically signed blocks, and the verification of transactions is performed by the users of the network. In this review, firstly, we aim to highlight the challenges of EHR and genomic data sharing. Secondly, we attempt to answer "Why"or "Why not"the blockchain technology is suitable for genomics and healthcare applications in detail. Thirdly, we elucidate the general blockchain structure based on the Ethereum, which is a more suitable technology for the genomic data sharing platforms. Fourthly, we review current blockchain-based EHR and genomic data sharing platforms, evaluate the advantages and disadvantages of these applications, and classify these applications using different metrics. Finally, we conclude by discussing the open issues and introducing our suggestion on the topic. In summary, to facilitate the diagnosis, monitoring and therapy of diseases with the effective analysis of -omics data with other available data types, through this review, we put forward the possible implications of the blockchain technology to life sciences and healthcare.Article Citation - Scopus: 4Aguhyper: A Hyperledger-Based Electronic Health Record Management Framework(PeerJ Inc, 2024-05-22) Dedeturk, Beyhan Adanur; Bakir-Gungor, BurcuThe increasing importance of healthcare records, particularly given the emergence of new diseases, emphasizes the need for secure electronic storage and dissemination. With these records dispersed across diverse healthcare entities, their physical maintenance proves to be excessively time-consuming. The prevalent management of electronic healthcare records (EHRs) presents inherent security vulnerabilities, including susceptibility to attacks and potential breaches orchestrated by malicious actors. To tackle these challenges, this article introduces AguHyper, a secure storage and sharing solution for EHRs built on a permissioned blockchain framework. AguHyper utilizes Hyperledger Fabric and the InterPlanetary Distributed File System (IPFS). Hyperledger Fabric establishes the blockchain network, while IPFS manages the off -chain storage of encrypted data, with hash values securely stored within the blockchain. Focusing on security, privacy, scalability, and data integrity, AguHyper ' s decentralized architecture eliminates single points of failure and ensures transparency for all network participants. The study develops a prototype to address gaps identi fi ed in prior research, providing insights into blockchain technology applications in healthcare. Detailed analyses of system architecture, AguHyper ' s implementation con fi gurations, and performance assessments with diverse datasets are provided. The experimental setup incorporates CouchDB and the Raft consensus mechanism, enabling a thorough comparison of system performance against existing studies in terms of throughput and latency. This contributes signi fi cantly to a comprehensive evaluation of the proposed solution and offers a unique perspective on existing literature in the fi eld.
