Textnettopics Pro, a Topic Model-Based Text Classification for Short Text by Integration of Semantic and Document-Topic Distribution Information

Voskergian, Daniel; Bakir-Gungor, Burcu; Yousef, Malik

doi:10.3389/fgene.2023.1243874

Textnettopics Pro, a Topic Model-Based Text Classification for Short Text by Integration of Semantic and Document-Topic Distribution Information

Files

fgene-14-1243874.pdf (3.21 MB)

Date

2023-10-05

Authors

Voskergian, Daniel

Bakir-Gungor, Burcu

Yousef, Malik

Publisher

Frontiers Media S.A.

Open Access Color

GOLD

Green Open Access

Yes

OpenAIRE Downloads

70

OpenAIRE Views

114

Publicly Funded

No

Impulse

Top 10%

Influence

Average

Popularity

Top 10%

Abstract

With the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category. Article titles are concise descriptions of the articles' content with valuable information that can be useful in document classification and categorization. However, shortness, data sparseness, limited word occurrences, and the inadequate contextual information of scientific document titles hinder the direct application of conventional text mining and machine learning algorithms on these short texts, making their classification a challenging task. This study firstly explores the performance of our earlier study, TextNetTopics on the short text. Secondly, here we propose an advanced version called TextNetTopics Pro, which is a novel short-text classification framework that utilizes a promising combination of lexical features organized in topics of words and topic distribution extracted by a topic model to alleviate the data-sparseness problem when classifying short texts. We evaluate our proposed approach using nine state-of-the-art short-text topic models on two publicly available datasets of scientific article titles as short-text documents. The first dataset is related to the Biomedical field, and the other one is related to Computer Science publications. Additionally, we comparatively evaluate the predictive performance of the models generated with and without using the abstracts. Finally, we demonstrate the robustness and effectiveness of the proposed approach in handling the imbalanced data, particularly in the classification of Drug-Induced Liver Injury articles as part of the CAMDA challenge. Taking advantage of the semantic information detected by topic models proved to be a reliable way to improve the overall performance of ML classifiers.

Description

Voskergian, Daniel/0009-0005-7544-9210

ORCID

Voskergian, Daniel

Keywords

Text Classification, Feature Selection, Topic Selection, Topic Projection, Topic Modeling, Short Text, Sparse Data, short text, sparse data, text classification, feature selection, topic modeling, Genetics, QH426-470, topic selection, topic projection

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

WoS Q

Q2

Scopus Q

Q2

OpenCitations Citation Count

9

Source

Frontiers in Genetics

Volume

14

URI

https://doi.org/10.3389/fgene.2023.1243874
https://hdl.handle.net/20.500.12573/4763

Collections

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

PlumX Metrics

Citations

Scopus : 16

PubMed : 2

Captures

Mendeley Readers : 20

Full item page

Google Scholar™

Check

Textnettopics Pro, a Topic Model-Based Text Classification for Short Text by Integration of Semantic and Document-Topic Distribution Information

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

Green Open Access

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

BIP! Indicators

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Description

ORCID

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Citation Count

Source

Volume

Issue

Start Page

End Page

URI

Collections

PlumX Metrics

Citations

Captures

Google Scholar™

OpenAlex FWCI

1.71

Sustainable Development Goals