Textnettopics Pro, a Topic Model-Based Text Classification for Short Text by Integration of Semantic and Document-Topic Distribution Information

dc.contributor.author Voskergian, Daniel
dc.contributor.author Bakir-Gungor, Burcu
dc.contributor.author Yousef, Malik
dc.date.accessioned 2025-09-25T10:58:43Z
dc.date.available 2025-09-25T10:58:43Z
dc.date.issued 2023
dc.description Voskergian, Daniel/0009-0005-7544-9210 en_US
dc.description.abstract With the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category. Article titles are concise descriptions of the articles' content with valuable information that can be useful in document classification and categorization. However, shortness, data sparseness, limited word occurrences, and the inadequate contextual information of scientific document titles hinder the direct application of conventional text mining and machine learning algorithms on these short texts, making their classification a challenging task. This study firstly explores the performance of our earlier study, TextNetTopics on the short text. Secondly, here we propose an advanced version called TextNetTopics Pro, which is a novel short-text classification framework that utilizes a promising combination of lexical features organized in topics of words and topic distribution extracted by a topic model to alleviate the data-sparseness problem when classifying short texts. We evaluate our proposed approach using nine state-of-the-art short-text topic models on two publicly available datasets of scientific article titles as short-text documents. The first dataset is related to the Biomedical field, and the other one is related to Computer Science publications. Additionally, we comparatively evaluate the predictive performance of the models generated with and without using the abstracts. Finally, we demonstrate the robustness and effectiveness of the proposed approach in handling the imbalanced data, particularly in the classification of Drug-Induced Liver Injury articles as part of the CAMDA challenge. Taking advantage of the semantic information detected by topic models proved to be a reliable way to improve the overall performance of ML classifiers. en_US
dc.identifier.doi 10.3389/fgene.2023.1243874
dc.identifier.issn 1664-8021
dc.identifier.scopus 2-s2.0-85174586981
dc.identifier.uri https://doi.org/10.3389/fgene.2023.1243874
dc.identifier.uri https://hdl.handle.net/20.500.12573/4763
dc.language.iso en en_US
dc.publisher Frontiers Media S.A. en_US
dc.relation.ispartof Frontiers in Genetics en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Text Classification en_US
dc.subject Feature Selection en_US
dc.subject Topic Selection en_US
dc.subject Topic Projection en_US
dc.subject Topic Modeling en_US
dc.subject Short Text en_US
dc.subject Sparse Data en_US
dc.title Textnettopics Pro, a Topic Model-Based Text Classification for Short Text by Integration of Semantic and Document-Topic Distribution Information en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Voskergian, Daniel/0009-0005-7544-9210
gdc.author.scopusid 57200259158
gdc.author.scopusid 25932029800
gdc.author.scopusid 14029389000
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Voskergian, Daniel] Al Quds Univ, Fac Engn, Dept Comp Engn, Jerusalem, Palestine; [Bakir-Gungor, Burcu] Abdullah Gul Univ, Fac Engn, Dept Comp Engn, Kayseri, Turkiye; [Yousef, Malik] Zefat Acad Coll, Dept Informat Syst, Safed, Israel; [Yousef, Malik] Zefat Acad Coll, Galilee Digital Hlth Res Ctr, Safed, Israel en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.volume 14 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q2
gdc.identifier.openalex W4387364902
gdc.identifier.pmid 37867598
gdc.identifier.wos WOS:001086438900001
gdc.index.type WoS
gdc.index.type Scopus
gdc.index.type PubMed
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.downloads 70
gdc.oaire.impulse 9.0
gdc.oaire.influence 3.0261342E-9
gdc.oaire.isgreen true
gdc.oaire.keywords short text
gdc.oaire.keywords sparse data
gdc.oaire.keywords text classification
gdc.oaire.keywords feature selection
gdc.oaire.keywords topic modeling
gdc.oaire.keywords Genetics
gdc.oaire.keywords QH426-470
gdc.oaire.keywords topic selection
gdc.oaire.keywords topic projection
gdc.oaire.popularity 8.938557E-9
gdc.oaire.publicfunded false
gdc.oaire.views 114
gdc.openalex.collaboration International
gdc.openalex.fwci 2.0948
gdc.openalex.normalizedpercentile 0.9
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 9
gdc.plumx.mendeley 15
gdc.plumx.newscount 1
gdc.plumx.pubmedcites 2
gdc.plumx.scopuscites 15
gdc.scopus.citedcount 15
gdc.virtual.author Güngör, Burcu
gdc.wos.citedcount 10
relation.isAuthorOfPublication e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isAuthorOfPublication.latestForDiscovery e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 52f507ab-f278-4a1f-824c-44da2a86bd51
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fgene-14-1243874.pdf
Size:
3.3 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: