Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 6 of 6
  • Article
    Topological Feature Generation for Link Prediction in Biological Networks
    (PeerJ Inc, 2023-05-09) Temiz, Mustafa; Bakir-Gungor, Burcu; Sahan, Pinar Guner; Coskun, Mustafa; Güner Şahan, Pınar
    Graph or network embedding is a powerful method for extracting missing or potential information from interactions between nodes in biological networks. Graph embedding methods learn representations of nodes and interactions in a graph with low-dimensional vectors, which facilitates research to predict potential interactions in networks. However, most graph embedding methods suffer from high computational costs in the form of high computational complexity of the embedding methods and learning times of the classifier, as well as the high dimensionality of complex biological networks. To address these challenges, in this study, we use the Chopper algorithm as an alternative approach to graph embedding, which accelerates the iterative processes and thus reduces the running time of the iterative algorithms for three different (nervous system, blood, heart) undirected protein-protein interaction (PPI) networks. Due to the high dimensionality of the matrix obtained after the embedding process, the data are transformed into a smaller representation by applying feature regularization techniques. We evaluated the performance of the proposed method by comparing it with state-of-the-art methods. Extensive experiments demonstrate that the proposed approach reduces the learning time of the classifier and performs better in link prediction. We have also shown that the proposed embedding method is faster than state-of-the-art methods on three different PPI datasets.
  • Article
    Citation - WoS: 48
    Citation - Scopus: 65
    Review of Feature Selection Approaches Based on Grouping of Features
    (PeerJ Inc, 2023-07-17) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Bulut, Nurten; Qaqish, Bahjat; Yousef, Malik
    With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly -ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
  • Article
    Citation - WoS: 35
    Citation - Scopus: 42
    Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Different Feature Selection Methods
    (PeerJ Inc, 2022-04-25) Bakir-Gungor, Burcu; Lar, Hilal Hac; Jabeer, Amhar; Nalbantoglu, Ozkan Ufuk; Aran, Oya; Yousef, Malik; Hacilar, Hilal
    The tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.
  • Article
    Citation - WoS: 152
    Citation - Scopus: 157
    Computational Analysis of MicroRNA-Mediated Interactions in SARS-CoV Infection
    (PeerJ Inc, 2020-06-05) Demirci, Muserref Duygu Sacar; Adan, Aysun
    MicroRNAs (miRNAs) are post-transcriptional regulators of gene expression found in more than 200 diverse organisms. Although it is still not fully established if RNA viruses could generate miRNAs, there are examples of miRNA like sequences from RNA viruses with regulatory functions. In the case of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there are several mechanisms that would make miRNAs impact the virus, like interfering with viral replication, translation and even modulating the host expression. In this study, we performed a machine learning based miRNA prediction analysis for the SARS-CoV-2 genome to identify miRNA-like hairpins and searched for potential miRNA-based interactions between the viral miRNAs and human genes and human miRNAs and viral genes. Overall, 950 hairpin structured sequences were extracted from the virus genome and based on the prediction results, 29 of them could be precursor miRNAs. Targeting analysis showed that 30 viral mature miRNA-like sequences could target 1,367 different human genes. PANTHER gene function analysis results indicated that viral derived miRNA candidates could target various human genes involved in crucial cellular processes including transcription, metabolism, defense system and several signaling pathways such as Wnt and EGFR signalings. Protein class-based grouping of targeted human genes showed that host transcription might be one of the main targets of the virus since 96 genes involved in transcriptional processes were potential targets of predicted viral miRNAs. For instance, basal transcription machinery elements including several components of human mediator complex (MED1, MED9, MED 12L, MED 19), basal transcription factors such as TAF4, TAF5, TAF7L and site-specific transcription factors such as STATI were found to be targeted. In addition, many known human miRNAs appeared to be able to target viral genes involved in viral life cycle such as S, M, N, E proteins and ORF lab, ORF3a, ORF8, ORF7a and ORF10. Considering the fact that miRNA-based therapies have been paid attention, based on the findings of this study, comprehending mode of actions of miRNAs and their possible roles during SARS-CoV-2 infections could create new opportunities for the development and improvement of new therapeutics.
  • Article
    Citation - WoS: 4
    Citation - Scopus: 5
    Comparative Analysis of Machine Learning Approaches for Predicting Respiratory Virus Infection and Symptom Severity
    (PeerJ Inc, 2023-06-30) Isik, Yunus Emre; Aydin, Zafer
    Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the 'adaptive immune system' and 'immune disease' are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms.
  • Article
    Citation - WoS: 18
    Citation - Scopus: 29
    Blockchain for Genomics and Healthcare: A Literature Review, Current Status, Classification and Open Issues
    (PeerJ Inc, 2021-09-30) Dedeturk, Beyhan Adanur; Soran, Ahmet; Bakir-Gungor, Burcu
    The tremendous boost in the next generation sequencing technologies and in the "omics"technologies resulted in the generation of hundreds of gigabytes of data per day. Nowadays, via integrating -omics data with other data types, such as imaging and electronic health record (EHR) data, panomics studies attempt to identify novel and potentially actionable biomarkers for personalized medicine applications. In this respect, for the accurate analysis of -omics data and EHR, there is a need to establish secure and robust pipelines that take the ethical aspects into consideration, regulate privacy and ownership issues, and data sharing. These days, blockchain technology has picked up significant attention in diverse fields, including genomics, since it offers a new solution for these problems from a different perspective. Blockchain is an immutable transaction ledger, which offers secure and distributed system without a central authority. Within the system, each transaction can be expressed with cryptographically signed blocks, and the verification of transactions is performed by the users of the network. In this review, firstly, we aim to highlight the challenges of EHR and genomic data sharing. Secondly, we attempt to answer "Why"or "Why not"the blockchain technology is suitable for genomics and healthcare applications in detail. Thirdly, we elucidate the general blockchain structure based on the Ethereum, which is a more suitable technology for the genomic data sharing platforms. Fourthly, we review current blockchain-based EHR and genomic data sharing platforms, evaluate the advantages and disadvantages of these applications, and classify these applications using different metrics. Finally, we conclude by discussing the open issues and introducing our suggestion on the topic. In summary, to facilitate the diagnosis, monitoring and therapy of diseases with the effective analysis of -omics data with other available data types, through this review, we put forward the possible implications of the blockchain technology to life sciences and healthcare.