Browsing by Author "Bakal, Gokhan"
Now showing 1 - 11 of 11
- Results Per Page
- Sort Options
Article Beyond visual cues: Emotion recognition in images with text-aware fusion☆(ELSEVIER, 2025) Sungur, Kerim Serdar; Bakal, Gokhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Sungur, Kerim Serdar; Bakal, GokhanSentiment analysis is a widely studied problem for understanding human emotions and potential outcomes. As it can be performed over textual data, working on visual data elements is also critically substantial to examining the current emotional status. In this effort, the aim is to investigate any potential enhancements in sentiment analysis predictions through visual instances by integrating textual data as additional knowledge reflecting the contextual information of the images. Thus, two separate models have been developed as image-processing and text-processing models in which both models were trained on distinct datasets comprising the same five human emotions. Following, the outputs of the individual models' last dense layers are combined to construct the hybrid multimodel empowered by visual and textual components. The fundamental focus is to evaluate the performance of the hybrid model in which the textual knowledge is concatenated with visual data. Essentially, the hybrid model achieved nearly a 3% F1-score improvement compared to the plain image classification model utilizing convolutional neural network architecture. In essence, this research underscores the potency of fusing textual context with visual information to refine sentiment analysis predictions. The findings not only emphasize the potential of a multi-modal approach but also spotlight a promising avenue for future advancements in emotion analysis and understanding.Article Building a challenging medical dataset for comparative evaluation of classifier capabilities(ELSEVIER, 2024) Bozkurt, Berat; Coskun, Kerem; Bakal, Gokhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bozkurt, Berat; Coskun, Kerem; Bakal, GokhanSince the 2000s, digitalization has been a crucial transformation in our lives. Nevertheless, digitalization brings a bulk of unstructured textual data to be processed, including articles, clinical records, web pages, and shared social media posts. As a critical analysis, the classification task classifies the given textual entities into correct categories. Categorizing documents from different domains is straightforward since the instances are unlikely to contain similar contexts. However, document classification in a single domain is more complicated due to sharing the same context. Thus, we aim to classify medical articles about four common cancer types (Leukemia, Non-Hodgkin Lymphoma, Bladder Cancer, and Thyroid Cancer) by constructing machine learning and deep learning models. We used 383,914 medical articles about four common cancer types collected by the PubMed API. To build classification models, we split the dataset into 70% as training, 20% as testing, and 10% as validation. We built widely used machine-learning (Logistic Regression, XGBoost, CatBoost, and Random Forest Classifiers) and modern deep-learning (convolutional neural networks - CNN, long short-term memory - LSTM, and gated recurrent unit - GRU) models. We computed the average classification performances (precision, recall, F-score) to evaluate the models over ten distinct dataset splits. The best-performing deep learning model(s) yielded a superior F1 score of 98%. However, traditional machine learning models also achieved reasonably high F1 scores, 95% for the worst-performing case. Ultimately, we constructed multiple models to classify articles, which compose a hard-to-classify dataset in the medical domain.conferenceobject.listelement.badge A Comparative Analysis on Medical Article Classification Using Text Mining & Machine Learning Algorithms(Institute of Electrical and Electronics Engineers Inc., 2021) Kolukisa, Burak; Dedeturk, Bilge Kagan; Dedeturk, Beyhan Adanur; Gulşen, Abdulkadir; Bakal, Gokhan; 0000-0003-0423-4595; 0000-0002-4250-2880; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Kolukisa, Burak; Gulşen, Abdulkadir; Bakal, Gokhan; Dedeturk, Beyhan AdanurThe document classification task is one of the widely studied research fields on multiple domains. The core motivation of the classification task is that the manual classification efforts are impractical due to the exponentially growing document volumes. Thus, we densely need to exploit automated computational approaches, such as machine learning models along with data & text mining techniques. In this study, we concentrated on the classification of medical articles specifically on common cancer types, due to the significance of the field and the decent number of available documents of interest. We deliberately targeted MEDLINE articles about common cancer types because most cancer types share a similar literature composition. Therefore, this situation makes the classification effort relatively more complicated. To this end, we built multiple machine learning models, including both traditional and deep learning architectures. We achieved the best performance (R¿82% F score) by the LSTM model. Overall, our results demonstrate a strong effect of exploiting both text mining and machine learning methods to distinguish medical articles on common cancer types.conferenceobject.listelement.badge A Computational Drug Repositioning Effort using Patients' Reviews Dataset(Institute of Electrical and Electronics Engineers Inc., 2023) Akkaya, Ali; Bakal, Gokhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Akkaya, Ali; Bakal, GokhanThe drug discovery process is one of the core motivations in both medical and, specifically, pharmaceutical disciplines. Due to the nature of the process, it requires an excessive amount of time, clinical experiments, and budget to cover each discovery phase. In this sense, computational drug discovery efforts can shorten the discovery process by providing plausible candidates since many of the attempts fail for several reasons, such as a lack of participants, financial problems, or ineffective results. In this study, the goal is to identify plausible candidate drugs for diseases. To do that, we utilize a personal experience of drugs dataset generated by patients. Beyond the user-generated comments, the users also give a rate between 1 and 10. Since we want to ensure the dataset quality, we first performed sentiment analysis experiments to prove that the reviews/comments are consistent with the given rating score. Then, only the review pairs having an effectiveness rate of 6 or more are selected as pre-filtered drug-disease pairs. We also build a knowledge graph using treatment-related biomedical relations using predications from Semantic Medline Database to identify drug similarities utilizing the Simrank similarity algorithm. As a result, we reported a list of plausible drugs as repurposing/repositioning candidates for further experiments.Article Document Classification with Contextually Enriched Word Embeddings(Bajece (İstanbul Teknik Ünv), 2024) Mahmood, Raad Saadi; Bakal, Gokhan; Akbas, Ayhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakal, Mehmet GokhanThe text classification task has a wide range of application domains for distinct purposes, such as the classification of articles, social media posts, and sentiments. As a natural language processing application, machine learning and deep learning techniques are intensively utilized in solving such challenges. One common approach is employing the discriminative word features comprising Bag-of-Words and n-grams to conduct text classification experiments. The other powerful approach is exploiting neural network-based (specifically deep learning models) through either sentence, word, or character levels. In this study, we proposed a novel approach to classify documents with contextually enriched word embeddings powered by the neighbor words accessible through the trigram word series. In the experiments, a well-known web of science dataset is exploited to demonstrate the novelty of the models. Consequently, we built various models constructed with and without the proposed approach to monitor the models' performances. The experimental models showed that the proposed neighborhood-based word embedding enrichment has decent potential to use in further studies.Article An empirical study of sentiment analysis utilizing machine learning and deep learning algorithms(SPRINGER, 2023) Erkantarci, Betul; Bakal, Gokhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Erkantarci, Betul; Bakal, GokhanAmong text-mining studies, one of the most studied topics is the text classifcation task applied in various domains, including medicine, social media, and academia. As a sub-problem in text classifcation, sentiment analysis has been widely investigated to classify often opinion-based textual elements. Specifcally, user reviews and experiential feedback for products or services have been employed as fundamental data sources for sentiment analysis eforts. As a result of rapidly emerging technological advancements, social media platforms such as Twitter, Facebook, and Reddit, have become central opinion-sharing mediums since the early 2000s. In this sense, we build various machine-learning models to solve the sentiment analysis problem on the Reddit comments dataset in this work. The experimental models we constructed achieve F1 scores within intervals of 73–76%. Consequently, we present comparative performance scores obtained by traditional machine learning and deep learning models and discuss the results.Article ENHANCING DEEP LEARNING PERFORMANCE THROUGH A GENETIC ALGORITHM-ENHANCED APPROACH: FOCUSING ON LSTM(Kahramanmaraş Sütçü İmam Üniversitesi, 2024) Şen, Tarık Üveys; Bakal, Gokhan; 0009-0000-0297-6064; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Şen, Tarık Üveys; Bakal, GokhanDeep learning has shown remarkable success in various applications, such as image classification, natural language processing, and speech recognition. However, training deep neural networks is challenging due to their complex architecture and the number of parameters required. Genetic algorithms have been proposed as an alternative optimization technique for deep learning, offering an efficient alternative way to find an optimal set of network parameters that minimize the objective function. In this paper, we propose a novel approach integrating genetic algorithms with deep learning, specifically LSTM models, to enhance performance. Our method optimizes crucial hyper-parameters including learning rate, batch size, neuron count per layer, and layer depth through genetic algorithms. Additionally, we conduct a comprehensive analysis of how genetic algorithm parameters influence the optimization process and illustrate their significant impact on improving LSTM model performance. Overall, the presented method provides a powerful mechanism for improving the performance of deep neural networks, and; thus, we believe that it has significant potential for future applications in the artificial intelligence discipline.conferenceobject.listelement.badge Graph-based Biomedical Knowledge Discovery(IEEE, 2024) Altuner, Osman; Bakir-Gungor, Burcu; Bakal, Gokhan; 0000-0003-2897-3894; 0000-0002-2272-6270; AGÜ, Mühendislik Fakültesi, Elektrik - Elektronik Mühendisliği Bölümü; Altuner, Osman; Bakir-Gungor, Burcu; Bakal, GökhanDijitalleşme süreci tüm dünyada oldukça yüksek bir hızla ilerlemektedir. Bu durum günümüz yaşantısında bir çok kolaylık sağladığı gibi ortaya çıkan devasa dijital verilerin analizi ve işlenmesi gibi bir problemi de beraberinde getirmektedir. Bu durum yayınlanan akademik çalışmalar için de geçerlidir. Bu anlamda çalışmalar dahilinde bulunan yenilikçi bilgilere ulaşmak için her bir çalışmayı değerlendirme süreci oldukça zahmetli bir süreci gerektirmektedir. Bu sebeple yapılan bu çalışmada hedef hastalıklar özelinde elde edilmiş yayınlar metin analiz süreçleriyle analiz edilmiş ve anlamlı terimlerin biyomedikal ilişkiler üzerinden bağlanmasını sağlayan çizge yapısına dönüştürülmüştür. Elde edilen yoğun çizge yapısı üzerinde treats (tedavi edici), causes (sebep verici), associated_with (ilişkili) gibi önemli bağlantılara sahip ikili biyomedikal varlıklar sorgulanmıştır. Sorgu sonuçlarına göre elde edilen varlık ikilileri manuel arama yöntemiyle de teyit edilmiş ve gerçek bağlantılar olduğu ispatlanmıştır. Bu çalışmayla birlikte, bilinen biyomedikal varlıkların önerilen yaklaşımla elde edilmesi uzun zaman gerektiren manuel arama problemini çözmesi hedeflenmektedir. Ayrıca birden fazla ikili bağlantı örüntüleriyle bilinmeyen/keşfedilmemiş olası yeni ilişkiler (tedavi edici, sebep verici, ilişkili vb.) elde etme potansiyeli de bulunmaktadır.Article Machine Learning based Network Intrusion Detection with Hybrid Frequent Item Set Mining(GAZİ ÜNİVERSİTESİ, 2024) Fırat, Murat; Bakal, Gokhan; Akbas, Ayhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakal, GokhanWith the development and expansion of computer networks day by day and the diversity of software developed, the damage that possible attacks can cause is increasing beyond the predictions. Intrusion Detection Systems (STS/IDS) are one of the practical defense tools against these potential attacks that are constantly growing and diversifying. Thus, one of the emerging methods among researchers is to train these systems with various artificial intelligence methods to detect subsequent attacks in real time and take the necessary precautions. However, the ultimate goal is to propose a hybrid feature selection approach to improve the classification performance. The raw dataset originally enclosed 85 descriptor features (attributes) for classification. These attributes are extracted using CICFlowMeter from a PCAP file where network traffic is recorded for data curation. In this study, classical feature selection methods and frequent item set mining approaches were employed in feature selection for constructing a hybrid model. We aimed to examine the effect of the proposed hybrid feature selection approach on the classification task for the network traffic data containing ordinary and attack records. The outcomes demonstrate that the proposed method gained nearly 3% improvement when applied with the Logistic Regression algorithm on classifying more than 225,000 records.conferenceobject.listelement.badge On Comparative Classification of Relevant Covid-19 Tweets(Institute of Electrical and Electronics Engineers Inc., 2021) Bakal, Gokhan; Abar, Orhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Bakal, GokhanDue to the impressive information dissemination power of social networks such as Twitter, people tend to check social networks and Web pages more than other traditional news sources, including newspapers, TV news programs, or radio channels. In that sense, the information carried by the content of the shared social media posts becomes much more considerable. However, most of the posts are commonly either irrelevant or inaccurate. Besides, the more critical case than the correctness of the information is the diffusion speed on Twitter through the reply or retweet actions. These activities make the initial situation even more complicated than itself due to the unregulated nature of the social networks and the lack of an immediate verification mechanism for the correctness of the posts. When we consider the current Covid-19 pandemic period (causing the coronavirus disease), one of the most utilized information resources is Twitter except the official health administration institutions. Thereupon, examining the correctness of the information related to the Covid-19 pandemic by computational techniques (e.g., Data Mining, Machine Learning, and Deep Learning) has been gaining popularity and remains a substantial task. Hence, we mainly focused on analyzing the correctness of the posts related to the current pandemic shared on the Twitter platform. Therefore, the overall goal of this work is to classify the relevant tweets using linear and non-linear machine learning models. We achieved the best F1 performance score (99%) with the neural network model using the unigram features & threshold value of 50 among all model configurations.conferenceobject.listelement.badge A Transfer Learning Application on the Reliability of Psychological Drugs' Comments(Institute of Electrical and Electronics Engineers Inc., 2023) Sen, Tarik Uveys; Bakal, Gokhan; 0000-0003-2897-3894; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Sen, Tarik Uveys; Bakal, GokhanAs digitalization and the Internet stay emerging concepts by gaining popularity, the accuracy of personal reviews/opinions will be a critical issue. This circumstance also particularly applies to patients taking psychological drugs, where accurate information is crucial for other patients and medical professionals. In this study, we analyze drug reviews from drugs.com to determine the effectiveness of reviews for psychological drugs. Our dataset includes over 200,000 drug reviews, which we labeled as positive, negative, or neutral according to their rating scores. We apply machine learning (ML) models, including Logistic Regression, Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) algorithms, to predict the sentiment class of each review. Our results demonstrate an F1-Weighted score of 85.3% for the LSTM model. However, by applying the transfer learning technique, we further improved the F1 score (nearly 3% increase) obtained by the LSTM model. Our findings proved that there is no contextual difference between the comments made by the patients suffering from psychological or other diseases.