WoS İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/394

Browse

Search Results

Now showing 1 - 8 of 8
  • Article
    AI-Driven Drug Repositioning: A Diffusion Model Approach on Knowledge Graphs
    (Elsevier, 2026) Erkantarci, Betul; Şen, Tarık Üveys; Bakal, Gokhan
    Drug repositioning - discovering new therapeutic applications for existing drugs - offers a promising pathway to accelerate cancer treatment development. This study proposes a diffusion model-driven framework that leverages biomedical knowledge graphs and graph-based learning to enhance drug repositioning predictions. The framework integrates data from the Semantic MEDLINE Database (SemMedDB), the Unified Medical Language System (UMLS), and the Repurposing Drugs Database (RepoDB) to construct a comprehensive therapeutic knowledge graph. Drug embeddings are generated using a one-layer Relational Graph Convolutional Network (R-GCN) incorporating semantic type-guided structural perturbations. These embeddings are refined through a flow-matching algorithm to denoise and reconstruct biologically meaningful representations. To evaluate the model's effectiveness, we apply a consensus strategy using Cosine Similarity, Euclidean Distance, and Manhattan Distance as proximity metrics. The model successfully identified, on average, 74 candidate drugs for repositioning in the context of leukemia. Qualitative analysis using t-distributed stochastic neighbor embedding (t-SNE) revealed enhanced clustering of pharmacologically relevant drugs in the denoised embedding space. Trastuzumab, in particular, emerged as a strong repositioning candidate for leukemia, supported by 156 co-mentions in PubMed. These findings demonstrate that the proposed framework improves embedding robustness and semantic fidelity, offering a powerful artificial intelligence (AI)-driven approach for precision oncology. Integrating structural noise modeling with diffusion-based denoising advances the discovery of novel drug-disease associations and holds potential for translational research and clinical hypothesis generation in drug repurposing.
  • Conference Object
    Text Classification Experiments on Contextual Graphs Built by N-Gram Series
    (Springer International Publishing AG, 2025) Sen, Tarik Uveys; Yakit, Mehmet Can; Gumus, Mehmet Semih; Abar, Orhan; Bakal, Gokhan
    Traditional n-gram textual features, commonly employed in conventional machine learning models, offer lower performance rates on high-volume datasets compared to modern deep learning algorithms, which have been intensively studied for the past decade. The main reason for this performance disparity is that deep learning approaches handle textual data through the word vector space representation by catching the contextually hidden information in a better way. Nonetheless, the potential of the n-gram feature set to reflect the context is open to further investigation. In this sense, creating graphs using discriminative ngram series with high classification power has never been fully exploited by researchers. Hence, the main goal of this study is to contribute to the classification power by including the long-range neighborhood relationships for each word in the word embedding representations. To achieve this goal, we transformed the textual data by employing n-gram series into a graph structure and then trained a graph convolution network model. Consequently, we obtained contextually enriched word embeddings and observed F1-score performance improvements from 0.78 to 0.80 when we integrated those convolution-based word embeddings into an LSTM model. This research contributes to improving classification capabilities by leveraging graph structures derived from discriminative n-gram series.
  • Conference Object
    Citation - Scopus: 1
    NLP-Driven Fake News Detection: A Machine Learning Perspective
    (IEEE, 2025-05-23) Coban, Mert Korkut; Bakal, Gokhan
    The rapid spread of fake news poses a significant challenge, impacting public opinion, decision-making, and societal trust. This study explores the application of Natural Language Processing (NLP) and Machine Learning (ML) techniques for robust fake news detection. Using datasets such as ISOT Fake News, WELFake, and Football Fake News, the project employs advanced preprocessing methods and feature extraction techniques, including TF-IDF, Word2Vec, and GloVe. A comprehensive evaluation of machine learning models-Random Forest, Support Vector Machines (SVM), and Neural Networks-was conducted to identify the optimal configuration. Results demonstrate that Random Forest with TF-IDF excels in in-domain detection, achieving an F1-score of 99.70%, while Neural Networks paired with Word2Vec and GloVe embeddings outperform in cross-dataset scenarios. The study highlights the importance of dataset size, domain relevance, and feature representation in achieving high generalizability. These findings provide a scalable framework for combating misinformation on digital platforms.
  • Article
    Citation - WoS: 2
    Machine Learning Based Network Intrusion Detection With Hybrid Frequent Item Set Mining
    (Gazi Univ, 2024-10-02) Firat, Murat; Bakal, Gokhan; Akbas, Ayhan; Bakal, Mehmet
    With the development and expansion of computer networks day by day and the diversity of software developed, the damage that possible attacks can cause is increasing beyond the predictions. Intrusion Detection Systems (STS/IDS) are one of the practical defense tools against these potential attacks that are constantly growing and diversifying. Thus, one of the emerging methods among researchers is to train these systems with various artificial intelligence methods to detect subsequent attacks in real time and take the necessary precautions. However, the ultimate goal is to propose a hybrid feature selection approach to improve the classification performance. The raw dataset originally enclosed 85 descriptor features (attributes) for classification. These attributes are extracted using CICFlowMeter from a PCAP file where network traffic is recorded for data curation. In this study, classical feature selection methods and frequent item set mining approaches were employed in feature selection for constructing a hybrid model. We aimed to examine the effect of the proposed hybrid feature selection approach on the classification task for the network traffic data containing ordinary and attack records. The outcomes demonstrate that the proposed method gained nearly 3% improvement when applied with the Logistic Regression algorithm on classifying more than 225,000 records.
  • Article
    Citation - WoS: 6
    Citation - Scopus: 5
    Enhancing Sentiment Analysis in Stock Market Tweets Through Bert-Based Knowledge Transfer
    (Springer, 2025-02-26) Cicekyurt, Emre; Bakal, Gokhan
    One of the widely studied text classification efforts is sentiment analysis. It is a specific examination involving natural language processing and machine learning methods to understand semantic orientation from textual data. Working social media posts, such as tweets, for sentiment analysis, is quite common among researchers due to the speed of information dissemination. In this regard, forecasting stock market tweets is a widely studied research topic. Some studies have revealed a strong connection between sentiment and stock market performance, while others have not found any notable associations. The proposed work shows two distinct approaches to sentiment analysis over the stock market tweets. The first approach employs traditional machine learning algorithms, including logistic regression, random forest, and XGBoost. The second approach constructs deep learning (as a subfield of machine learning) models using LSTM and CNN algorithms to classify the test instances into positive, negative, or neutral classes through ten randomly shuffled data splits. In this study, the labeled data size is gradually increased utilizing a pre-trained model, FinBERT. It is exclusively employed to label unlabeled data instances to integrate them into the experiments. The goal is to monitor the effect of the additional newly-labeled examples on the sentiment analysis performance. The experiments showed that the average F1-score improved by 20% for the deep learning models and 17% for the machine learning models. In the end, the paper reveals a strong positive correlation between training data size and the classification performance of the experimental approaches.
  • Article
    Citation - WoS: 4
    Citation - Scopus: 7
    Combining N-Grams and Graph Convolution for Text Classification
    (Elsevier, 2025-05) Sen, Tarik Uveys; Yakit, Mehmet Can; Gumus, Mehmet Semih; Abar, Orhan; Bakal, Gokhan
    Text classification, a cornerstone of natural language processing (NLP), finds applications in diverse areas, from sentiment analysis to topic categorization. While deep learning models have recently dominated the field, traditional n-gram-driven approaches often struggle to achieve comparable performance, particularly on large datasets. This gap largely stems from deep learning' s superior ability to capture contextual information through word embeddings. This paper explores a novel approach to leverage the often-overlooked power of n-gram features for enriching word representations and boosting text classification accuracy. We propose a method that transforms textual data into graph structures, utilizing discriminative n-gram series to establish long-range relationships between words. By training a graph convolution network on these graphs, we derive contextually enhanced word embeddings that encapsulate dependencies extending beyond local contexts. Our experiments demonstrate that integrating these enriched embeddings into an long-short term memory (LSTM) model for text classification leads to around 2% improvements in classification performance across diverse datasets. This achievement highlights the synergy of combining traditional n-gram features with graph-based deep learning techniques for building more powerful text classifiers.
  • Article
    Citation - WoS: 4
    Citation - Scopus: 5
    Beyond Visual Cues: Emotion Recognition in Images With Text-Aware Fusion
    (Elsevier, 2025-04) Sungur, Kerim Serdar; Bakal, Gokhan
    Sentiment analysis is a widely studied problem for understanding human emotions and potential outcomes. As it can be performed over textual data, working on visual data elements is also critically substantial to examining the current emotional status. In this effort, the aim is to investigate any potential enhancements in sentiment analysis predictions through visual instances by integrating textual data as additional knowledge reflecting the contextual information of the images. Thus, two separate models have been developed as image-processing and text-processing models in which both models were trained on distinct datasets comprising the same five human emotions. Following, the outputs of the individual models' last dense layers are combined to construct the hybrid multimodel empowered by visual and textual components. The fundamental focus is to evaluate the performance of the hybrid model in which the textual knowledge is concatenated with visual data. Essentially, the hybrid model achieved nearly a 3% F1-score improvement compared to the plain image classification model utilizing convolutional neural network architecture. In essence, this research underscores the potency of fusing textual context with visual information to refine sentiment analysis predictions. The findings not only emphasize the potential of a multi-modal approach but also spotlight a promising avenue for future advancements in emotion analysis and understanding.
  • Article
    Citation - WoS: 16
    Citation - Scopus: 11
    An Empirical Study of Sentiment Analysis Utilizing Machine Learning and Deep Learning Algorithms
    (Springernature, 2023-12-09) Erkantarci, Betul; Bakal, Gokhan
    Among text-mining studies, one of the most studied topics is the text classification task applied in various domains, including medicine, social media, and academia. As a sub-problem in text classification, sentiment analysis has been widely investigated to classify often opinion-based textual elements. Specifically, user reviews and experiential feedback for products or services have been employed as fundamental data sources for sentiment analysis efforts. As a result of rapidly emerging technological advancements, social media platforms such as Twitter, Facebook, and Reddit, have become central opinion-sharing mediums since the early 2000s. In this sense, we build various machine-learning models to solve the sentiment analysis problem on the Reddit comments dataset in this work. The experimental models we constructed achieve F1 scores within intervals of 73-76%. Consequently, we present comparative performance scores obtained by traditional machine learning and deep learning models and discuss the results.