Scopus İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395

Browse

Search Results

Now showing 1 - 10 of 10

Citation - Scopus: 1
Words Speak Louder Than Actions: Decoding Emotions Through NLP
(Institute of Electrical and Electronics Engineers Inc., 2024-10-26) Paksoy, Melda; Bakal, Gokhan
Emotion detection in text remains a significant challenge in Natural Language Processing due to human emotions' complexity and subtle nuances. This paper presents multiple experimental models for emotion classification using an up-to-date dataset curated to address 13 emotions implied in Twitter posts. We evaluated various machine learning (ML) models, including Logistic Regression, Random Forest, SVM, and XGBoost, alongside deep learning (DL) architectures such as LSTM and CNN. Our results demonstrate the efficacy of deep learning models, particularly the CNN model by achieving an impressive F1 score of 0.99. This study contributes to emotion detection capabilities, paving the way for more nuanced and accurate sentiment analysis (SA) in various text analysis applications. © 2025 Elsevier B.V., All rights reserved.
Citation - Scopus: 1
TextNetTopics_TIS: Enhancing Textnettopics With Random Forest-Based Topic Importance Scoring
(Institute of Electrical and Electronics Engineers Inc., 2024-10-16) Voskergian, Daniel; Bakir-Güngör, Burcu; Yousef, Malik
TextNetTopics is an innovative Latent Dirichlet Allocation-based topic selection method for training text classification models. One main limitation is its computationally intensive scoring mechanism, especially when applied to many topics. This scoring mechanism involves training a machine learning model (i.e., Random Forest) on each topic using the Monte-Carlo Cross-Validation approach and assigning a score value based on a specific performance metric (e.g., accuracy or F1-score). Moreover, the measured score does not account for the interactions between all features residing in all topics. This paper presents a new topic-scoring mechanism called Topic Importance Scoring. This computationally efficient approach trains a Random Forest model on all topics simultaneously and leverages the extracted feature importance values to give each topic a score reflecting its classification potential. The experiments on three diverse datasets confirm that the proposed method's performance is superior to the Topic Performance Scoring, which was used in the original TextNetTopics method. © 2024 Elsevier B.V., All rights reserved.
Citation - Scopus: 3
Prediction of Colorectal Cancer Based on Taxonomic Levels of Microorganisms and Discovery of Taxonomic Biomarkers Using the Grouping-Scoring (G-S-M) Approach
(Elsevier Ltd, 2025-03) Bakir-Güngör, Burcu; Temiz, Mustafa; Canakcimaksutoglu, Beyza; Yousef, Malik
Colorectal cancer (CRC) is one of the most prevalent forms of cancer globally. The human gut microbiome plays an important role in the development of CRC and serves as a biomarker for early detection and treatment. This research effort focuses on the identification of potential taxonomic biomarkers of CRC using a grouping-based feature selection method. Additionally, this study investigates the effect of incorporating biological domain knowledge into the feature selection process while identifying CRC-associated microorganisms. Conventional feature selection techniques often fail to leverage existing biological knowledge during metagenomic data analysis. To address this gap, we propose taxonomy-based Grouping Scoring Modeling (G-S-M) method that integrates biological domain knowledge into feature grouping and selection. In this study, using metagenomic data related to CRC, classification is performed at three taxonomic levels (genus, family and order). The MetaPhlAn tool is employed to determine the relative abundance values of species in each sample. Comparative performance analyses involve six feature selection methods and four classification algorithms. When experimented on two CRC associated metagenomics datasets, the highest performance metric, yielding an AUC of 0.90, is observed at the genus taxonomic level. At this level, 7 out of top 10 groups (Parvimonas, Peptostreptococcus, Fusobacterium, Gemella, Streptococcus, Porphyromonas and Solobacterium) were commonly identified for both datasets. Moreover, the identified microorganisms at genus, family, and order levels are thoroughly discussed via refering to CRC-related metagenomic literature. This study not only contributes to our understanding of CRC development, but also highlights the applicability of taxonomy-based G-S-M method in tackling various diseases. © 2025 Elsevier B.V., All rights reserved.
Metabolomics Data Analysis to Discover Chronic Granulomatous Disease-Associated Biomarkers Utilizing G-S-M Machine Learning Model via Grouping Metabolites According to Ion Type
(Institute of Electrical and Electronics Engineers Inc., 2024-10-16) Ersöz, Nur Sebnem; Bakir-Güngör, Burcu; Yousef, Malik
Chronic Granulomatous Disease (CGD) is a rare, inherited immunodeficiency disorder characterized by white blood cells unable to effectively kill certain bacteria and fungi. This defect results in the formation of clusters of immune cells called granulomas that form at sites of infection or inflammation. Therefore, identification of disease-related biomarkers is a critical step in advancing precision medicine and improving diagnostic accuracy. In this study, we applied a G-S-M machine learning approach to metabolomics data to uncover CGD-Associated biomarkers. We obtained a metabolomics dataset from Gene Expression Omnibus with GSE220260 accession number. Data includes 85 samples (16 healthy controls and 69 CGD samples) with comprehensive metabolic profiles obtained using liquid chromatography-mass spectrometry analysis. Dataset includes metabolite names with their ion type and formula. In order to identify CGD related metabolites and their ion types, G-S-M was used as a grouping function when performing machine learning oriented metabolomics data analysis. We have performed the G-S-M approach by grouping metabolites according to their ion type. In the training part of the G-S-M approach, metabolites annotated with selected ion types have been utilized to perform a two-class classification task which generates an important set of ion type output. We also compared the performance results of the G-S-M machine learning model with traditional feature selection methods; XGB, SKB, IG, FCBF, MRMR, CMIM with random forest classifier. 100 times Monte-Carlo Cross Validation was used in our experiments. It was observed that G-S-M, XGB, SKB and FCBF methods similarly provided the best performances. In this study, besides its performance, G-S-M method used groups based on ion types unlike TFS, and then identified relevant Chronic Granulomatous Disease-associated metabolites. © 2024 Elsevier B.V., All rights reserved.
Citation - Scopus: 1
Improving Salary Offer Processes With Classification Based Machine Learning Models
(Institute of Electrical and Electronics Engineers Inc., 2024-09-21) Kaya, Rukiye; Saatci, Mehtap; Bakal, Gokhan; Bakal, Mehmet Gokhan
In job applications, salary is major motivational factor for employees and making accurate salary prediction is crucial for both employers and employees. Utilizing advanced technologies can significantly enhance the accuracy and efficiency of salary prediction process. In this study, we explore Machine Learning (ML) methods to enhance salary prediction process. We evaluated seven classification models for predicting salary categories, with the Artificial Neural Network (ANN) model achieving the highest accuracy at 58.2% on the test dataset, followed by the K-Nearest Neighbors (KNN) model with an accuracy of 56.8%. Additionally, we employed ensemble models to further enhance prediction accuracy. Among these, the Majority Voting Classifier using Hard Voting achieved the highest accuracy at 59.3%, demonstrating the potential of ensemble techniques in refining salary predictions. The developed salary prediction tool estimates the most appropriate salary category for each candidate and help mitigate potential biases in manual salary assessments, hence enables a more objective and consistent compensation system. ∗CRITICAL: Do Not Use Symbols, Special Characters, or Math in Paper Title or Abstract, and do not cite other papers in the abstract. © 2024 Elsevier B.V., All rights reserved.
Citation - Scopus: 1
From Traditional to Deep: Evaluating Sentiment Analysis Models on a Large-Scale Tweet Dataset
(Institute of Electrical and Electronics Engineers Inc., 2024-10-26) Mammadov, Alisahib; Bakal, Gokhan
This study investigates the effectiveness of various machine learning (ML) and deep learning (DL) techniques for large-scale sentiment analysis on Twitter data. We leverage a publicly available dataset of one million tweets, annotated with four sentiment labels (positive, negative, uncertainty, and liti-gious), to train and evaluate a range of models. Our experiments demonstrate that traditional ML algorithms, particularly XG-Boost, achieve high performance, with the best F1 score reaching 95.81% using a combination of unigrams and bigrams. Among DL models, a hybrid CNN-BiGRU architecture yields the highest average F1 score of 95.42%. Our findings highlight the strengths of different approaches for sentiment analysis on Twitter data and emphasize the importance of data preprocessing and model selection for achieving optimal performance. © 2025 Elsevier B.V., All rights reserved.
Evaluating the Impact of Sentiment Analysis on Deep Reinforcement Learning-Based Trading Strategies
(Institute of Electrical and Electronics Engineers Inc., 2024-10-26) Etcil, Mustafa; Kolukisa, Burak; Bakir-Güngör, Burcu
Portfolio optimization is a form of investment management that aims to maximize returns while minimizing risks. However, the inherent complexity and unpredictability of financial markets pose a challenge. Recent advancements in machine learning, particularly in deep reinforcement learning (DRL), offer promising solutions by enabling dynamic and adaptive trading strategies. This paper presents a comprehensive evaluation of three actor-critic-based DRL algorithms-Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO)-applied to portfolio optimization. These strategies were implemented in both sentiment-aware and non-sentiment-aware versions, allowing for a direct comparison of their performance. The sentiment-aware models incorporated sentiment analysis using FinBERT and knowledge graphs to measure market sentiment from financial news, while the non-sentiment-aware models relied solely on stock prices and technical indicators. Our comparative study demonstrates that incorporating sentiment analysis resulted in consistently superior risk-adjusted returns and portfolio resilience during market fluctuations compared to non-sentiment-aware strategies. © 2025 Elsevier B.V., All rights reserved.
Citation - Scopus: 2
Data-Driven Methods for Optimal Setting of Legacy Control Devices in Distribution Grids
(IEEE Computer Society, 2024-07-21) Savasci, Alper; Ceylan, Oǧuzhan; Paudyal, Sumit
This study presents machine learning-based dispatch strategies for legacy voltage regulation devices, i.e., onload tap changers (OLTCs), step-voltage regulators (SVRs), and switched-capacitors (SCs) in modern distribution networks. The proposed approach utilizes k-nearest neighbor (KNN), random forest (RF), and neural networks (NN) to map nodal net active and reactive injections to the optimal legacy controls and resulting voltage magnitudes. To implement these strategies, first, an efficient optimal power flow (OPF) is formulated as a mixed-integer linear program that obtains optimal decisions of tap positions for OLTCs, SVRs, and on/off status of SCs. Then, training and testing datasets are generated by solving the OPF model for daily horizons with 1-hr resolution for varying loading and photovoltaic (PV) generation profile. Case studies on the 33-node feeder demonstrate high-accuracy mapping between the input feature and the output vector, which is promising for integrated Volt/VAr control schemes. © 2024 Elsevier B.V., All rights reserved.
Citation - Scopus: 6
CSA-DE-LR Enhancing Cardiovascular Disease Diagnosis With a Novel Hybrid Machine Learning Approach
(PeerJ Inc., 2024-07-18) Dedeturk, Beyhan Adanur; Dedeturk, Bilge Kagan; Bakir-Güngör, Burcu
Cardiovascular diseases (CVD) are a leading cause of mortality globally, necessitating the development of efficient diagnostic tools. Machine learning (ML) and metaheuristic algorithms have become prevalent in addressing these challenges, providing promising solutions in medical diagnostics. However, traditional ML approaches often need to be improved in feature selection and optimization, leading to suboptimal performance in complex diagnostic tasks. To overcome these limitations, this study introduces a new hybrid method called CSA-DE-LR, which combines the clonal selection algorithm (CSA) and differential evolution (DE) with logistic regression. This integration is designed to optimize logistic regression weights efficiently for the accurate classification of CVD. The methodology employs three optimization strategies based on the F1 score, the Matthews correlation coefficient (MCC), and the mean absolute error (MAE). Extensive evaluations on benchmark datasets, namely Cleveland and Statlog, reveal that CSA-DELR outperforms state-of-the-art ML methods. In addition, generalization is evaluated using the Breast Cancer Wisconsin Original (WBCO) and Breast Cancer Wisconsin Diagnostic (WBCD) datasets. Significantly, the proposed model demonstrates superior efficacy compared to previous research studies in this domain. This study’s findings highlight the potential of hybrid machine learning approaches for improving diagnostic accuracy, offering a significant advancement in the fields of medical data analysis and CVD diagnosis. © 2024 Elsevier B.V., All rights reserved.
Citation - Scopus: 1
Integrative Analyses in Omics Data: Machine Learning Perspective
(Deutsche Gesellschaft fur Medizinische Informatik, Biometrie und Epidemiologie e.V., 2023) Ünlü Yazici, Miray; Bakir-Güngör, Burcu; Yousef, Malik; Yazici, Miray Unlu
Developments in the high throughput technologies have enabled the production of an immense amount of knowledge at the multi-omics level. Considering complex diseases which are affected by multi-factors, single omics datasets might not be sufficient to unveil the molecular mechanisms of heterogeneous diseases. Providing a comprehensive and systematic overview to explain disease hallmarks in significant depth is critical. Utilizing multi-omics datasets has led to the development of a variety of tools and platforms. Machine learning models are utilized in a wide variety of tools to tackle the complexity of disorders and to identify new biomolecular signatures and potential markers. Underlying aspects of these approaches are based on training the models for making predictions and classification of the given data. In this review, we describe current machine learning-based approaches and available implementations. Challenges in the enlightenment of disease mechanisms of onset and progression and future development of the field of medicine will be discussed. The prominence of biological interpretation of model output with corresponding biological knowledge will be also covered in this review. © 2023 Elsevier B.V., All rights reserved.

Scopus İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results