Scopus İndeksli Yayınlar Koleksiyonu
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/395
Browse
18 results
Search Results
Article AI-Driven Drug Repositioning: A Diffusion Model Approach on Knowledge Graphs(Elsevier, 2026) Erkantarci, Betul; Şen, Tarık Üveys; Bakal, GokhanDrug repositioning - discovering new therapeutic applications for existing drugs - offers a promising pathway to accelerate cancer treatment development. This study proposes a diffusion model-driven framework that leverages biomedical knowledge graphs and graph-based learning to enhance drug repositioning predictions. The framework integrates data from the Semantic MEDLINE Database (SemMedDB), the Unified Medical Language System (UMLS), and the Repurposing Drugs Database (RepoDB) to construct a comprehensive therapeutic knowledge graph. Drug embeddings are generated using a one-layer Relational Graph Convolutional Network (R-GCN) incorporating semantic type-guided structural perturbations. These embeddings are refined through a flow-matching algorithm to denoise and reconstruct biologically meaningful representations. To evaluate the model's effectiveness, we apply a consensus strategy using Cosine Similarity, Euclidean Distance, and Manhattan Distance as proximity metrics. The model successfully identified, on average, 74 candidate drugs for repositioning in the context of leukemia. Qualitative analysis using t-distributed stochastic neighbor embedding (t-SNE) revealed enhanced clustering of pharmacologically relevant drugs in the denoised embedding space. Trastuzumab, in particular, emerged as a strong repositioning candidate for leukemia, supported by 156 co-mentions in PubMed. These findings demonstrate that the proposed framework improves embedding robustness and semantic fidelity, offering a powerful artificial intelligence (AI)-driven approach for precision oncology. Integrating structural noise modeling with diffusion-based denoising advances the discovery of novel drug-disease associations and holds potential for translational research and clinical hypothesis generation in drug repurposing.Conference Object A Data-Driven Framework for Predicting Flight Arrival Delays Using Integrated Aviation and Meteorological Data(Institute of Electrical and Electronics Engineers Inc., 2025-09-17) Akyuz, Muhammed Sefa; Bakal, GokhanConference Object Citation - Scopus: 1Words Speak Louder Than Actions: Decoding Emotions Through NLP(Institute of Electrical and Electronics Engineers Inc., 2024-10-26) Paksoy, Melda; Bakal, GokhanEmotion detection in text remains a significant challenge in Natural Language Processing due to human emotions' complexity and subtle nuances. This paper presents multiple experimental models for emotion classification using an up-to-date dataset curated to address 13 emotions implied in Twitter posts. We evaluated various machine learning (ML) models, including Logistic Regression, Random Forest, SVM, and XGBoost, alongside deep learning (DL) architectures such as LSTM and CNN. Our results demonstrate the efficacy of deep learning models, particularly the CNN model by achieving an impressive F1 score of 0.99. This study contributes to emotion detection capabilities, paving the way for more nuanced and accurate sentiment analysis (SA) in various text analysis applications. © 2025 Elsevier B.V., All rights reserved.Conference Object Text Classification Experiments on Contextual Graphs Built by N-Gram Series(Springer International Publishing AG, 2025) Sen, Tarik Uveys; Yakit, Mehmet Can; Gumus, Mehmet Semih; Abar, Orhan; Bakal, GokhanTraditional n-gram textual features, commonly employed in conventional machine learning models, offer lower performance rates on high-volume datasets compared to modern deep learning algorithms, which have been intensively studied for the past decade. The main reason for this performance disparity is that deep learning approaches handle textual data through the word vector space representation by catching the contextually hidden information in a better way. Nonetheless, the potential of the n-gram feature set to reflect the context is open to further investigation. In this sense, creating graphs using discriminative ngram series with high classification power has never been fully exploited by researchers. Hence, the main goal of this study is to contribute to the classification power by including the long-range neighborhood relationships for each word in the word embedding representations. To achieve this goal, we transformed the textual data by employing n-gram series into a graph structure and then trained a graph convolution network model. Consequently, we obtained contextually enriched word embeddings and observed F1-score performance improvements from 0.78 to 0.80 when we integrated those convolution-based word embeddings into an LSTM model. This research contributes to improving classification capabilities by leveraging graph structures derived from discriminative n-gram series.Conference Object Citation - Scopus: 8On Comparative Classification of Relevant COVID-19 Tweets(Institute of Electrical and Electronics Engineers Inc., 2021-09-15) Bakal, Gokhan; Abar, OrhanDue to the impressive information dissemination power of social networks such as Twitter, people tend to check social networks and Web pages more than other traditional news sources, including newspapers, TV news programs, or radio channels. In that sense, the information carried by the content of the shared social media posts becomes much more considerable. However, most of the posts are commonly either irrelevant or inaccurate. Besides, the more critical case than the correctness of the information is the diffusion speed on Twitter through the reply or retweet actions. These activities make the initial situation even more complicated than itself due to the unregulated nature of the social networks and the lack of an immediate verification mechanism for the correctness of the posts. When we consider the current Covid-19 pandemic period (causing the coronavirus disease), one of the most utilized information resources is Twitter except the official health administration institutions. Thereupon, examining the correctness of the information related to the Covid-19 pandemic by computational techniques (e.g., Data Mining, Machine Learning, and Deep Learning) has been gaining popularity and remains a substantial task. Hence, we mainly focused on analyzing the correctness of the posts related to the current pandemic shared on the Twitter platform. Therefore, the overall goal of this work is to classify the relevant tweets using linear and non-linear machine learning models. We achieved the best F1 performance score (99%) with the neural network model using the unigram features & threshold value of 50 among all model configurations. © 2022 Elsevier B.V., All rights reserved.Conference Object Citation - Scopus: 1NLP-Driven Fake News Detection: A Machine Learning Perspective(IEEE, 2025-05-23) Coban, Mert Korkut; Bakal, GokhanThe rapid spread of fake news poses a significant challenge, impacting public opinion, decision-making, and societal trust. This study explores the application of Natural Language Processing (NLP) and Machine Learning (ML) techniques for robust fake news detection. Using datasets such as ISOT Fake News, WELFake, and Football Fake News, the project employs advanced preprocessing methods and feature extraction techniques, including TF-IDF, Word2Vec, and GloVe. A comprehensive evaluation of machine learning models-Random Forest, Support Vector Machines (SVM), and Neural Networks-was conducted to identify the optimal configuration. Results demonstrate that Random Forest with TF-IDF excels in in-domain detection, achieving an F1-score of 99.70%, while Neural Networks paired with Word2Vec and GloVe embeddings outperform in cross-dataset scenarios. The study highlights the importance of dataset size, domain relevance, and feature representation in achieving high generalizability. These findings provide a scalable framework for combating misinformation on digital platforms.Conference Object Multi-Method Text Summarization: Evaluating Extractive and BART-Based Approaches on CNN/Daily Mail(Institute of Electrical and Electronics Engineers Inc., 2025-06-27) Inal, Yasin; Bakal, Gokhan; Esit, MuhammedWith the exponential growth of digital content, efficient text summarization has become increasingly crucial for managing information overload. This paper presents a comprehensive approach to text summarization using both extractive and abstractive methods, implemented on the CNN/Daily Mail dataset. We leverage pre-trained BART (Bidirectional and AutoRegressive Transformers) models and fine-tuning techniques to generate high-quality summaries. Our approach demonstrates significant improvements, with our best model trained on 287 k samples achieving ROUGE-1 F1 scores of 0.4174, ROUGE-2 F1 scores of 0.1932, and ROUGE-L F1 scores of 0.2910. We provide detailed comparisons between extractive methods and various BART model configurations, analyzing the impact of training dataset size and model architecture on summarization quality. Additionally, we share our implementation through an opensource NLP toolkit to facilitate further research and practical applications in the field. © 2025 Elsevier B.V., All rights reserved.Conference Object Citation - Scopus: 1Improving Salary Offer Processes With Classification Based Machine Learning Models(Institute of Electrical and Electronics Engineers Inc., 2024-09-21) Kaya, Rukiye; Saatci, Mehtap; Bakal, Gokhan; Bakal, Mehmet GokhanIn job applications, salary is major motivational factor for employees and making accurate salary prediction is crucial for both employers and employees. Utilizing advanced technologies can significantly enhance the accuracy and efficiency of salary prediction process. In this study, we explore Machine Learning (ML) methods to enhance salary prediction process. We evaluated seven classification models for predicting salary categories, with the Artificial Neural Network (ANN) model achieving the highest accuracy at 58.2% on the test dataset, followed by the K-Nearest Neighbors (KNN) model with an accuracy of 56.8%. Additionally, we employed ensemble models to further enhance prediction accuracy. Among these, the Majority Voting Classifier using Hard Voting achieved the highest accuracy at 59.3%, demonstrating the potential of ensemble techniques in refining salary predictions. The developed salary prediction tool estimates the most appropriate salary category for each candidate and help mitigate potential biases in manual salary assessments, hence enables a more objective and consistent compensation system. ∗CRITICAL: Do Not Use Symbols, Special Characters, or Math in Paper Title or Abstract, and do not cite other papers in the abstract. © 2024 Elsevier B.V., All rights reserved.Conference Object Graph-Based Biomedical Knowledge Discovery(IEEE, 2024-05-15) Altuner, Osman; Bakir-Gungor, Burcu; Bakal, GokhanThe digitalization process is progressing at a very high speed all over the world. While this situation provides many conveniences in today's life, it also brings along a problem such as analyzing and processing the huge digital data. This also applies to published academic studies. In this sense, the process of evaluating each study to access previously unknown information within the studies requires a very laborious process. For this reason, in this study, the publications obtained for the target diseases were analyzed by text analysis processes and converted into a graph structure that enables the linking of meaningful terms through biomedical relationships. On the dense graph structure obtained, binary biomedical entities with important links such as treats, causes, associated_with were queried. The entity pairs obtained according to the query results were also confirmed by manual search method and proved to be real connections. In this study, retrieval of known biomedical entities with the proposed approach solved the time-consuming manual search problem. There is also the potential to obtain unknown/unexplored possible new relationships (e.g., therapeutic, causal, etc.) with multiple binary linking patterns.Conference Object Citation - Scopus: 1From Traditional to Deep: Evaluating Sentiment Analysis Models on a Large-Scale Tweet Dataset(Institute of Electrical and Electronics Engineers Inc., 2024-10-26) Mammadov, Alisahib; Bakal, GokhanThis study investigates the effectiveness of various machine learning (ML) and deep learning (DL) techniques for large-scale sentiment analysis on Twitter data. We leverage a publicly available dataset of one million tweets, annotated with four sentiment labels (positive, negative, uncertainty, and liti-gious), to train and evaluate a range of models. Our experiments demonstrate that traditional ML algorithms, particularly XG-Boost, achieve high performance, with the best F1 score reaching 95.81% using a combination of unigrams and bigrams. Among DL models, a hybrid CNN-BiGRU architecture yields the highest average F1 score of 95.42%. Our findings highlight the strengths of different approaches for sentiment analysis on Twitter data and emphasize the importance of data preprocessing and model selection for achieving optimal performance. © 2025 Elsevier B.V., All rights reserved.
