WoS İndeksli Yayınlar Koleksiyonu

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12573/394

Browse

Search Results

Now showing 1 - 5 of 5

Citation - Scopus: 1
eTNT: Enhanced Textnettopics With Filtered LDA Topics and Sequential Forward / Backward Topic Scoring Approaches
(Science and Information Organization, 2024) Voskergian, Daniel; Jayousi, Rashid; Bakir-Güngör, Burcu
TextNetTopics is a novel text classification-based topic modelling approach that focuses on topic selection rather than individual word selection to train a machine learning algorithm. However, one key limitation of TextNetTopics is its scoring component, which evaluates each topic in isolation and ranks them accordingly, ignoring the potential relationships between topics. In addition, the chosen topics may contain redundant or irrelevant features, potentially increasing the feature set size and introducing noise that can degrade the overall model performance. To address these limitations and improve the classification performance, this study introduces an enhancement to TextNetTopics. eTNT integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. Moreover, it incorporates a filtering component that aims to enhance topics' quality and discriminative power by removing non-informative features from each topic using Random Forest feature importance values. These integrations aim to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained from the WOS-5736, LitCovid, and MultiLabel datasets provide valuable insights into the superior effectiveness of eTNT compared to its counterpart, TextNetTopics. © 2024 Elsevier B.V., All rights reserved.
Citation - WoS: 1
Citation - Scopus: 1
Textnettopics-SFTS-SBTS Textnettopics Scoring Approaches Based Sequential Forward and Backward
(Springer International Publishing AG, 2024) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
TextNetTopics is a text classification-based topic modeling approach that performs topic selection rather than word selection to train a machine learning algorithm. However, one main limitation of TextNetTopics is that its scoring component (the S component) assesses each topic independently and ranks them accordingly, neglecting the potential relationship between topics. In order to address this limitation and improve the classification performance, this study introduces an enhancement to TextNetTopics. TextNetTopics-SFTS-SBTS integrates two novel scoring approaches: Sequential Forward Topic Scoring (SFTS) and Sequential Backward Topic Scoring (SBTS), which consider topic interactions by assessing sets of topics simultaneously. This integration aims to streamline the topic selection process and enhance classifier efficiency for text classification. The results obtained across three datasets offer valuable insights into the context-dependent effectiveness of the new scoring mechanisms across diverse datasets and varying numbers of topics involved in the analysis.
Citation - WoS: 11
Citation - Scopus: 16
Textnettopics Pro, a Topic Model-Based Text Classification for Short Text by Integration of Semantic and Document-Topic Distribution Information
(Frontiers Media S.A., 2023-10-05) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
With the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category. Article titles are concise descriptions of the articles' content with valuable information that can be useful in document classification and categorization. However, shortness, data sparseness, limited word occurrences, and the inadequate contextual information of scientific document titles hinder the direct application of conventional text mining and machine learning algorithms on these short texts, making their classification a challenging task. This study firstly explores the performance of our earlier study, TextNetTopics on the short text. Secondly, here we propose an advanced version called TextNetTopics Pro, which is a novel short-text classification framework that utilizes a promising combination of lexical features organized in topics of words and topic distribution extracted by a topic model to alleviate the data-sparseness problem when classifying short texts. We evaluate our proposed approach using nine state-of-the-art short-text topic models on two publicly available datasets of scientific article titles as short-text documents. The first dataset is related to the Biomedical field, and the other one is related to Computer Science publications. Additionally, we comparatively evaluate the predictive performance of the models generated with and without using the abstracts. Finally, we demonstrate the robustness and effectiveness of the proposed approach in handling the imbalanced data, particularly in the classification of Drug-Induced Liver Injury articles as part of the CAMDA challenge. Taking advantage of the semantic information detected by topic models proved to be a reliable way to improve the overall performance of ML classifiers.
TextNetTopics+: Enhancing Text Classification Through Classifier Diversity and Model Ensembling
(Springer International Publishing AG, 2025) Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik
TextNetTopics is an innovative text classification framework that integrates topic modeling with feature selection to improve model accuracy and interpretability. Unlike traditional methods that rely on individual words, TextNetTopics selects cohesive topics extracted via Latent Dirichlet Allocation as features for document representation, effectively reducing dimensionality while preserving the semantic structure of the text. This study evaluates the performance of TextNetTopics utilizing multiple machine learning algorithms in the M (Modeling) component, including Random Forest, Support Vector Machine, Gradient Boosting, eXtreme Gradient Boosting, and Logistic Regression. To further enhance classification performance, we introduce TextNetTopics+, an ensemblebased extension that leverages both hard voting and soft voting mechanisms to combine the strengths of multiple classifiers. Comprehensive experiments on the LitCovid and WOS datasets demonstrate that ensemble learning in TextNetTopics + significantly outperforms individual classifiers in TextNetTopics, confirming its effectiveness in improving model robustness and generalization.
Email Clustering & Generating Email Templates Based on Their Topics
(Assoc Computing Machinery, 2021-05-27) Coskun, Fatih; Gezer, Cengiz; Gungor, V. Cagri
Email templates have a significant impact on users in terms of productivity. Using an email template that is produced successfully is going to transfer the main information with a considerable impression. While the previous studies were focused on the email generation by text-differences in the content of the emails, generated templates based on email topics can provide better productivity for the companies. This article proposes a system, in which user emails are clustered according to the topics of the emails, and introduces an email template generation system that utilizes the sample emails belonging to the formed email clusters. For this purpose, the Enron email dataset has been used and the performance of different text preprocessing and topic modeling algorithms, such as DMM, GPU-DMM, GPU-PDMM, LF-DMM, LDA, LF-LDA, BTM, WNTM, PTM, SATM, have been investigated and compared to determine the most efficient one. After obtaining the email topics, the system shows the examples of the emails representing the selected topics and enables the authorized users to create templates that generalize these topics.

WoS İndeksli Yayınlar Koleksiyonu

Browse

Filters

Settings

Sort By

Results per page

Search Results