Building a challenging medical dataset for comparative evaluation of classifier capabilities

Bozkurt, Berat; Coskun, Kerem; Bakal, Gokhan

Building a challenging medical dataset for comparative evaluation of classifier capabilities

dc.contributor.author	Bozkurt, Berat
dc.contributor.author	Coskun, Kerem
dc.contributor.author	Bakal, Gokhan
dc.contributor.authorID	0000-0003-2897-3894	en_US
dc.contributor.department	AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.contributor.institutionauthor	Bozkurt, Berat
dc.contributor.institutionauthor	Coskun, Kerem
dc.contributor.institutionauthor	Bakal, Gokhan
dc.date.accessioned	2024-08-20T12:11:10Z
dc.date.available	2024-08-20T12:11:10Z
dc.date.issued	2024	en_US
dc.description.abstract	Since the 2000s, digitalization has been a crucial transformation in our lives. Nevertheless, digitalization brings a bulk of unstructured textual data to be processed, including articles, clinical records, web pages, and shared social media posts. As a critical analysis, the classification task classifies the given textual entities into correct categories. Categorizing documents from different domains is straightforward since the instances are unlikely to contain similar contexts. However, document classification in a single domain is more complicated due to sharing the same context. Thus, we aim to classify medical articles about four common cancer types (Leukemia, Non-Hodgkin Lymphoma, Bladder Cancer, and Thyroid Cancer) by constructing machine learning and deep learning models. We used 383,914 medical articles about four common cancer types collected by the PubMed API. To build classification models, we split the dataset into 70% as training, 20% as testing, and 10% as validation. We built widely used machine-learning (Logistic Regression, XGBoost, CatBoost, and Random Forest Classifiers) and modern deep-learning (convolutional neural networks - CNN, long short-term memory - LSTM, and gated recurrent unit - GRU) models. We computed the average classification performances (precision, recall, F-score) to evaluate the models over ten distinct dataset splits. The best-performing deep learning model(s) yielded a superior F1 score of 98%. However, traditional machine learning models also achieved reasonably high F1 scores, 95% for the worst-performing case. Ultimately, we constructed multiple models to classify articles, which compose a hard-to-classify dataset in the medical domain.	en_US
dc.identifier.endpage	8	en_US
dc.identifier.issn	00104825
dc.identifier.startpage	1	en_US
dc.identifier.uri	https://doi.org/10.1016/j.compbiomed.2024.108721
dc.identifier.uri	https://hdl.handle.net/20.500.12573/2339
dc.identifier.volume	178	en_US
dc.language.iso	eng	en_US
dc.publisher	ELSEVIER	en_US
dc.relation.isversionof	10.1016/j.compbiomed.2024.108721	en_US
dc.relation.journal	Computers in Biology and Medicine	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Text mining	en_US
dc.subject	Classification	en_US
dc.subject	Machine learning	en_US
dc.subject	Deep learning	en_US
dc.title	Building a challenging medical dataset for comparative evaluation of classifier capabilities	en_US
dc.type	article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1-s2.0-S0010482524008060-main.pdf
Size:: 1.12 MB
Format:: Adobe Portable Document Format
Description:: Makale Dosyası

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.44 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Bilgisayar Mühendisliği Bölümü Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu