Browsing by Author "Bakir-Gungor, Burcu"
Now showing 1 - 20 of 51
- Results Per Page
- Sort Options
Article Active Subnetwork GA: A Two Stage Genetic Algorithm Approach to Active Subnetwork Search(BENTHAM SCIENCE PUBL LTDEXECUTIVE STE Y-2, PO BOX 7917, SAIF ZONE, 1200 BR SHARJAH, U ARAB EMIRATES, 2017) Ozisik, Ozan; Bakir-Gungor, Burcu; Diri, Banu; Sezerman, Osman Ugur; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, BurcuBackground: A group of interconnected genes in a protein-protein interaction network that contains most of the disease associated genes is called an active subnetwork. Active subnetwork search is an NP-hard problem. In the last decade, simulated annealing, greedy search, color coding, genetic algorithm, and mathematical programming based methods are proposed for this problem. Method: In this study, we employed a novel genetic algorithm method for active subnetwork search problem. We used active node list chromosome representation, branch swapping crossover operator, multicombination of branches in crossover, mutation on duplicate individuals, pruning, and two stage genetic algorithm approach. The proposed method is tested on simulated datasets and Wellcome Trust Case Control Consortium rheumatoid arthritis genome-wide association study dataset. Our results are compared with the results of a simple genetic algorithm implementation and the results of the simulated annealing method that is proposed by Ideker et al. in their seminal paper. Results and Conclusion: The comparative study demonstrates that our genetic algorithm approach outperforms the simple genetic algorithm implementation in all datasets and simulated annealing in all but one datasets in terms of obtained scores, although our method is slower. Functional enrichment results show that the presented approach can successfully extract high scoring subnetworks in simulated datasets and identify significant rheumatoid arthritis associated subnetworks in the real dataset. This method can be easily used on the datasets of other complex diseases to detect disease-related active subnetworks. Our implementation is freely available at https://www.ce.yildiz.edu.tr/personal/ozanoz/file/6611/ActSubGAArticle Aguhyper: a hyperledger-based electronic health record management framework(PEERJ INC, 2024) Dedeturk, Beyhan Adanur; Bakir-Gungor, Burcu; 0000-0003-4983-2417; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Dedeturk, Beyhan Adanur; Bakir-Gungor, BurcuThe increasing importance of healthcare records, particularly given the emergence of new diseases, emphasizes the need for secure electronic storage and dissemination. With these records dispersed across diverse healthcare entities, their physical maintenance proves to be excessively time-consuming. The prevalent management of electronic healthcare records (EHRs) presents inherent security vulnerabilities, including susceptibility to attacks and potential breaches orchestrated by malicious actors. To tackle these challenges, this article introduces AguHyper, a secure storage and sharing solution for EHRs built on a permissioned blockchain framework. AguHyper utilizes Hyperledger Fabric and the InterPlanetary Distributed File System (IPFS). Hyperledger Fabric establishes the blockchain network, while IPFS manages the off-chain storage of encrypted data, with hash values securely stored within the blockchain. Focusing on security, privacy, scalability, and data integrity, AguHyper’s decentralized architecture eliminates single points of failure and ensures transparency for all network participants. The study develops a prototype to address gaps identified in prior research, providing insights into blockchain technology applications in healthcare. Detailed analyses of system architecture, AguHyper’s implementation configurations, and performance assessments with diverse datasets are provided. The experimental setup incorporates CouchDB and the Raft consensus mechanism, enabling a thorough comparison of system performance against existing studies in terms of throughput and latency. This contributes significantly to a comprehensive evaluation of the proposed solution and offers a unique perspective on existing literature in the field.Article AMP-GSM: Prediction of Antimicrobial Peptides via a Grouping–Scoring–Modeling Approach(MDPI, 2023) Soylemez, Ummu Gulsum; Yousef, Malik; Bakir-Gungor, Burcu; 0000-0002-6602-772X; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Soylemez, Ummu Gulsum; Bakir-Gungor, BurcuDue to the increasing resistance of bacteria to antibiotics, scientists began seeking new solutions against this problem. One of the most promising solutions in this field are antimicrobial peptides (AMP). To identify antimicrobial peptides, and to aid the design and production of novel antimicrobial peptides, there is a growing interest in the development of computational prediction approaches, in parallel with the studies performing wet-lab experiments. The computational approaches aim to understand what controls antimicrobial activity from the perspective of machine learning, and to uncover the biological properties that define antimicrobial activity. Throughout this study, we aim to develop a novel prediction approach that can identify peptides with high antimicrobial activity against selected target bacteria. Along this line, we propose a novel method called AMP-GSM (antimicrobial peptide-grouping-scoring-modeling). AMP-GSM includes three main components: grouping, scoring, and modeling. The grouping component creates sub-datasets via placing the physicochemical, linguistic, sequence, and structure-based features into different groups. The scoring component gives a score for each group according to their ability to distinguish whether it is an antimicrobial peptide or not. As the final part of our method, the model built using the top-ranked groups is evaluated (modeling component). The method was tested for three AMP prediction datasets, and the prediction performance of AMP-GSM was comparatively evaluated with several feature selection methods and several classifiers. When we used 10 features (which are members of the physicochemical group), we obtained the highest area under curve (AUC) value for both the Gram-negative (99%) and Gram-positive (98%) datasets. AMP-GSM investigates the most significant feature groups that improve AMP prediction. A number of physico-chemical features from the AMP-GSM's final selection demonstrate how important these variables are in terms of defining peptide characteristics and how they should be taken into account when creating models to predict peptide activity.Review Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data(MDPIST ALBAN-ANLAGE 66, CH-4052 BASEL, SWITZERLAND, 2021) Yousef, Malik; Kumar, Abhishek; Bakir-Gungor, Burcu; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, BurcuIn the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.conferenceobject.listelement.badge Blockchain-based Fog Computing Applications in Healthcare(IEEE, 2020) Adanur, Beyhan; Bakir-Gungor, Burcu; Soran, Ahmet; 0000-0003-4983-2417; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Adanur, Beyhan; Bakir-Gungor, Burcu; Soran, Ahmet— Son zamanlarda blokzincir teknolojisinin sağlık alanında kullanımı artmıştır. Blokzincir teknolojisinin sağlık alanına getirdiği birçok yenilik olmasına rağmen, halen çözülmeyi bekleyen problemleri mevcuttur. Bu problemlere alternatif çözümler getirmesi amacıyla, sis bilişimin blokzincir teknolojisi ile birlikte kullanılması gündeme gelmiştir. Bu çalışmada, blokzincir tabanlı sis bilişim teknolojisinin sağlık alanındaki uygulamaları incelenmektedir. Sunulan çalışmanın amacı, sağlık alanında, blokzincir ve sis bilişiminin etkileşimli bir şekilde kullanımı hakkında okuyucuların fikir edinmelerini sağlamaktır. Bu amaç doğrultusunda öncelikle, sis bilişimi ve blokzincir teknolojileri tanıtılmıştır. Sonrasında, alanların birbirlerine entegrasyonu, bu teknolojilerin beraber kullanımının sağlık alanına getirdiği avantajlar ve dezavantajlar tartışılmış ve bu teknolojilerin beraber kullanımlarına dair sistem önerisinde bulunulmuştur.Article CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques(ELSEVIER, 2024) Bakir-Gungor, Burcu; Temiz, Mustafa; Inal, Yasin; Cicekyurt, Emre; Yousef, Malik; 0000-0002-2272-6270; 0000-0002-2839-1424; 0009-0002-4373-8526; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu; Temiz, Mustafa; Inal, Yasin; Cicekyurt, EmreColorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2′ 3′ cyclic 3′ phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED.Other Classification of Breast Cancer Molecular Subtypes with Grouping-Scoring-Modeling Approach that Incorporates Disease-Disease Association Information(IEEE Xplore, 2024) Qumsiyeh, Emma; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, BurcuThis study uses modern sequencing technology and large biological databases to investigate the molecular intricacies of complicated diseases like cancer. Using gene expression databases and biomarkers, the research aims to improve breast cancer molecular subtype identification for better patient outcomes. Using BRCA LumAB_ Her2Basal dataset, this study compares an integrative machine learning-based strategy (GediNET) to traditional feature selection approaches across machine learning classifiers. GediNET excels at uncovering crucial disease-disease connections and potential biomarkers using the Grouping-Scoring-Modeling (GSM) approach, which favors gene groupings above individual genes. Our comparative analysis highlights GediNET's exceptional performance, notably in terms of accuracy and Area Under the Curve metrics, underscoring its effectiveness in uncovering the genetic intricacies of breast cancer. GediNET's promise to improve disease classification and biomarker identification by improving biological mechanism understanding goes beyond exceeding traditional approaches. The work shows that GediNET's integrative method can promote bioinformatics research by identifying the most informative genes associated with certain diseases, enabling focused and customized medicine.conferenceobject.listelement.badge A comparative study on psychiatric disorders: Identification of shared pathways and common agents(Institute of Electrical and Electronics Engineers Inc., 2022) Kuzudisli, Cihan; Bakir-Gungor, Burcu; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, BurcuDistinct but closely related diseases generally present shared symptoms, which address possible overlaps among their pathogenic mechanisms. Identification of significantly impacted shared pathways and other common agents are expected to elucidate etiology of these disorders and to help design better intervention strategies. In this research effort, we studied six psychiatric disorders including schizophrenia (SCZ), anorexia (AN), bipolar disorder (BD), depressive disorder (DD), autism (AU) and attention deficit hyperactivity disorder (ADHD). Our methodology can be classified into the following two parts: In Part I, common susceptibility genes; and in Part II, genome-wide association studies (GWAS) data were used to find enriched pathways of psychiatric disorders. 59 KEGG pathways were commonly identified in both parts. 31 of these pathways are disease pathways. Pathways related to cancer and infectious diseases were predominant compared to others. Most of the acquired pathways were in accordance with previous studies in literature. A combination of susceptibility genes and GWAS data is an effective approach to identify significantly impacted pathways in multifactorial diseases. In this respect, shared modules were determined after applying hierarchical clustering of the enriched pathways. These identified modules may tell us the association of psychiatric disorders with the enriched pathways. Taken all together, common pathways and shared modules are expected to highlight the causative factors and important mechanisms behind complex psychiatric diseases, leading to effective drug discovery.Article CSA-DE-LR: enhancing cardiovascular disease diagnosis with a novel hybrid machine learning approach(PEERJ INC, 2024) Dedeturk, Beyhan Adanur; Dedeturk, Bilge Kagan; Bakir-Gungor, Burcu; 0000-0003-4983-2417; 0000-0002-8026-5003; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Beyhan Adanur, Dedeturk; Bakir-Gungor, BurcuCardiovascular diseases (CVD) are a leading cause of mortality globally, necessitating the development of efficient diagnostic tools. Machine learning (ML) and metaheuristic algorithms have become prevalent in addressing these challenges, providing promising solutions in medical diagnostics. However, traditional ML approaches often need to be improved in feature selection and optimization, leading to suboptimal performance in complex diagnostic tasks. To overcome these limitations, this study introduces a new hybrid method called CSA-DE-LR, which combines the clonal selection algorithm (CSA) and differential evolution (DE) with logistic regression. This integration is designed to optimize logistic regression weights efficiently for the accurate classification of CVD. The methodology employs three optimization strategies based on the F1 score, the Matthews correlation coefficient (MCC), and the mean absolute error (MAE). Extensive evaluations on benchmark datasets, namely Cleveland and Statlog, reveal that CSA-DELR outperforms state-of-the-art ML methods. In addition, generalization is evaluated using the Breast Cancer Wisconsin Original (WBCO) and Breast Cancer Wisconsin Diagnostic (WBCD) datasets. Significantly, the proposed model demonstrates superior efficacy compared to previous research studies in this domain. This study's findings highlight the potential of hybrid machine learning approaches for improving diagnostic accuracy, offering a significant advancement in the fields of medical data analysis and CVD diagnosis.Article The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behçet's Disease(Institute of Electrical and Electronics Engineers Inc., 2021) Isik, Yunus EMRE; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Aydin, Zafer; Burcu, Bakir-Gungor,Behçet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behçet's disease. By combining biological information and machine learning classifiers, up to 99.64% accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments. IEEEArticle Developing a label propagation approach for cancer subtype classification problem(TUBITAK SCIENTIFIC & TECHNICAL RESEARCH COUNCIL TURKEYATATURK BULVARI NO 221, KAVAKLIDERE, ANKARA 00000, TURKEY, 2022) Guner, Pinar; Bakir-Gungor, Burcu; Coskun, Mustafa; 0000-0001-5979-0375; 0000-0002-2272-6270; 0000-0003-4805-1416; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Guner, Pinar; Bakir-Gungor, Burcu; Coskun, MustafaCancer is a disease in which abnormal cells grow uncontrollably and invade other tissues. Several types of cancer have various subtypes with different clinical and biological implications. Based on these differences, treatment methods need to be customized. The identification of distinct cancer subtypes is an important problem in bioinformatics, since it can guide future precision medicine applications. In order to design targeted treatments, bioinformatics methods attempt to discover common molecular pathology of different cancer subtypes. Along this line, several computational methods have been proposed to discover cancer subtypes or to stratify cancer into informative subtypes. However, existing works do not consider the sparseness of data (genes having low degrees) and result in an ill-conditioned solution. To address this shortcoming, in this paper, we propose an alternative unsupervised method to stratify cancer patients into subtypes using applied numerical algebra techniques. More specifically, we applied a label propagationbased approach to stratify somatic mutation profiles of colon, head and neck, uterine, bladder, and breast tumors. We evaluated the performance of our method by comparing it to the baseline methods. Extensive experiments demonstrate that our approach highly renders tumor classification tasks by largely outperforming the state-of-the-art unsupervised and supervised approaches.Article Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods(FRONTIERS MEDIA SAAVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE CH-1015, SWITZERLAND, 2021) Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, Amhar; Nalbantoglu, O. Ufuk; Yousef, Malik; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, Burcu; Bulut, Osman; Jabeer, AmharHuman gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.conferenceobject.listelement.badge The Effect of Different Classifiers on Recursive Cluster Elimination in the Analysis of Transcriptomic Data(Institute of Electrical and Electronics Engineers Inc., 2023) Bulut, Nurten; Bakir-Gungor, Burcu; Qaqish, Bahjat F.; Yousef, Malik; 0000-0002-1895-8749; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bulut, Nurten; Bakir-Gungor, BurcuGene expression data with limited sample size and a large number of genes are frequently encountered in genetic studies. In such high-dimensional data, identification of genes that distinguish between disease states is a challenging task. Feature selection (FS) is a useful approach in dealing with high dimensionality. Support Vector Machines Recursive Cluster Elimination (SVM-RCE) is a technique for FS in highdimensional data. The SVM-RCE approach has been utilized for identification of clusters of genes whose expression levels correlate with pathological state. A key step in SVM-RCE is the use of an SVM classifier to assign an area under the curve (AUC) score to each gene cluster based on its ability to predict class labels. In this study, we investigate the use of alternative classifiers in the cluster-scoring step. Specifically, we compare Support Vector Machines, Random Forest, XgBoost, Naive Bayes, and linear logistic regression. In addition to AUC score performance evaluation, the algorithms are compared in terms of the number of selected genes at different levels of clustering and in terms of the running time.conferenceobject.listelement.badge Effect of Recursive Cluster Elimination with Different Clustering Algorithms Applied to Gene Expression Data(Institute of Electrical and Electronics Engineers Inc., 2023) Kuzudisli, Cihan; Bakir-Gungor, Burcu; Qaqish, Bahjat F.; Yousef, Malik.; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, BurcuFeature selection (FS) is an effective tool in dealing with high dimensionality and reducing computational cost. Support Vector Machines – Recursive Cluster Elimination (SVM-RCE) is one of several algorithms that have been developed for FS in high dimensional data. SVM-RCE involves a clustering step which originally is k-means. Using various performance metrics, three alternative algorithms are evaluated in this context; k-medoids, Hierarchical Clustering (HC), and Gaussian Mixture Model (GMM). Comparisons will be carried out on five publicly available gene expression datasets. The results show that k-means in SVM-RCE obtains higher performance than other tested algorithms in terms of classification performance. Additionally, HC shows a similar performance to k-means. Our findings show superiority of using k-means. This study can contribute to the development of SVMRCE with different variations, leading to decrease in the number of selected genes, and an increase in prediction performance.Article Ensemble feature selection and classification methods for machine learning-based coronary artery disease diagnosis(ELSEVIER, 2023) Kolukisa, Burak; Bakir-Gungor, Burcu; 0000-0002-2272-6270; 0000-0003-0423-4595; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Kolukısa, Burak; Bakır Güngör, BurcuCoronary artery disease (CAD) is a condition in which the heart is not fed sufficiently as a result of the accumulation of fatty matter. As reported by the World Health Organization, around 32% of the total deaths in the world are caused by CAD, and it is estimated that approximately 23.6 million people will die from this disease in 2030. CAD develops over time, and the diagnosis of this disease is difficult until a blockage or a heart attack occurs. In order to bypass the side effects and high costs of the current methods, researchers have proposed to diagnose CADs with computer-aided systems, which analyze some physical and biochemical values at a lower cost. In this study, for the CAD diagnosis, (i) seven different computational feature selection (FS) methods, one domain knowledge-based FS method, and different classification algorithms have been evaluated; (ii) an exhaustive ensemble FS method and a probabilistic ensemble FS method have been proposed. The proposed approach is tested on three publicly available CAD data sets using six different classification algorithms and four different variants of voting algorithms. The performance metrics have been comparatively evaluated with numerous combinations of classifiers and FS methods. The multi-layer perceptron classifier obtained satisfactory results on three data sets. Performance evaluations show that the proposed approach resulted in 91.78%, 85.55%, and 85.47% accuracy for the Z-Alizadeh Sani, Statlog, and Cleveland data sets, respectively.Article Ensemble Feature Selection for Clustering Damage Modes in Carbon Fiber-Reinforced Polymer Sandwich Composites Using Acoustic Emission(John Wiley and Sons Inc, 2024) Gulsen, Abdulkadir; Kolukisa, Burak; Caliskan, Umut; Bakir-Gungor, Burcu; Gungor, Vehbi Cagri; 0000-0002-4250-2880; 0000-0003-0423-4595; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Gulsen, Abdulkadir; Kolukisa, Burak; Bakir-Gungor, BurcuAcoustic emission (AE) serves as a noninvasive technique for real-time structural health monitoring, capturing the stress waves produced by the formation and growth of cracks within a material. This study presents a novel ensemble feature selection methodology to rank features highly relevant with damage modes in AE signals gathered from edgewise compression tests on honeycomb-core carbon fiber-reinforced polymer. Two distinct features, amplitude and peak frequency, are selected for labeling the AE signals. An ensemble-supervised feature selection method ranks feature importance according to these labels. Using the ranking list, unsupervised clustering models are then applied to identify damage modes. The comparative results reveal a robust correlation between the damage modes and the features of counts and energy when amplitude is selected. Similarly, when peak frequency is chosen, a significant association is observed between the damage modes and the features of partial powers 1 and 2. These findings demonstrate that, in addition to the commonly used features, other features, such as partial powers, exhibit a correlation with damage modes.Article eTNT: Enhanced TextNetTopics with Filtered LDA Topics and Sequential Forward / Backward Topic Scoring Approaches(SCIENCE & INFORMATION-SAI ORGANIZATION LTD, 2024) Voskergian, Daniel; Jayousi, Rashid; Bakir-Gungor, Burcu; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakir-Gungor, BurcuTextNetTopics is a novel text classification-based topic modelling approach that focuses on topic selection rather than individual word selection to train a machine learning algorithm. However, one key limitation of TextNetTopics is its scoring component, which evaluates each topic in isolation and ranks them accordingly, ignoring the potential relationships between topics. In addition, the chosen topics may contain redundant or irrelevant features, potentially increasing the feature set size and introducing noise that can degrade the overall model performance. To address these limitations and improve the classification performance, this study introduces an enhancement to TextNetTopics. eTNT integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. Moreover, it incorporates a filtering component that aims to enhance topics' quality and discriminative power by removing non-informative features from each topic using Random Forest feature importance values. These integrations aim to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained from the WOS-5736, LitCovid, and MultiLabel datasets provide valuable insights into the superior effectiveness of eTNT compared to its counterpart, TextNetTopics.conferenceobject.listelement.badge Evaluation of Classification Algorithms, Linear Discriminant Analysis and a New Hybrid Feature Selection Methodology for the Diagnosis of Coronary Artery Disease(IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2018) Kolukisa, Burak; Hacilar, Hilal; Goy, Gokhan; Kus, Mustafa; Bakir-Gungor, Burcu; Aral, Atilla); Gungor, Vehbi Cagri; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği BölümüAccording to the World Health Organization (WHO), 31% of the world's total deaths in 2016 (17.9 million) was due to cardiovascular diseases (CVD). With the development of information technologies, it has become possible to predict whether people have heart diseases or not by checking certain physical and biochemical values at a lower cost. In this study, we have evalated a set of different classification algorithms, linear discriminant analysis and proposed a new hybrid feature selection methodology for the diagnosis of coronary heart diseases (CHD). Throughout this research effort, using three publicly available Heart Disease diagnosis datasets (UCI Machine Learning Repository), we have conducted comparative performance evaluations in terms of accuracy, sensitivity, specificity, F-measure, AUC and running time.Article GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning(FRONTIERS MEDIA SA, 2023) Ersoz, Nur Sebnem; Bakir-Gungor, Burcu; Yousef, Malik; 0000-0003-3343-9936; 0000-0002-2272-6270; AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Moleküler Biyoloji ve Genetik Bölümü; Ersoz, Nur Sebnem; Bakir-Gungor, BurcuIntroduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product.Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype.Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model.Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.conferenceobject.listelement.badge Graph-based Biomedical Knowledge Discovery(IEEE, 2024) Altuner, Osman; Bakir-Gungor, Burcu; Bakal, Gokhan; 0000-0003-2897-3894; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Elektrik - Elektronik Mühendisliği Bölümü; Altuner, Osman; Bakir-Gungor, Burcu; Bakal, GökhanDijitalleşme süreci tüm dünyada oldukça yüksek bir hızla ilerlemektedir. Bu durum günümüz yaşantısında bir çok kolaylık sağladığı gibi ortaya çıkan devasa dijital verilerin analizi ve işlenmesi gibi bir problemi de beraberinde getirmektedir. Bu durum yayınlanan akademik çalışmalar için de geçerlidir. Bu anlamda çalışmalar dahilinde bulunan yenilikçi bilgilere ulaşmak için her bir çalışmayı değerlendirme süreci oldukça zahmetli bir süreci gerektirmektedir. Bu sebeple yapılan bu çalışmada hedef hastalıklar özelinde elde edilmiş yayınlar metin analiz süreçleriyle analiz edilmiş ve anlamlı terimlerin biyomedikal ilişkiler üzerinden bağlanmasını sağlayan çizge yapısına dönüştürülmüştür. Elde edilen yoğun çizge yapısı üzerinde treats (tedavi edici), causes (sebep verici), associated_with (ilişkili) gibi önemli bağlantılara sahip ikili biyomedikal varlıklar sorgulanmıştır. Sorgu sonuçlarına göre elde edilen varlık ikilileri manuel arama yöntemiyle de teyit edilmiş ve gerçek bağlantılar olduğu ispatlanmıştır. Bu çalışmayla birlikte, bilinen biyomedikal varlıkların önerilen yaklaşımla elde edilmesi uzun zaman gerektiren manuel arama problemini çözmesi hedeflenmektedir. Ayrıca birden fazla ikili bağlantı örüntüleriyle bilinmeyen/keşfedilmemiş olası yeni ilişkiler (tedavi edici, sebep verici, ilişkili vb.) elde etme potansiyeli de bulunmaktadır.
- «
- 1 (current)
- 2
- 3
- »