CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques

dc.contributor.author Bakir-Gungor, Burcu
dc.contributor.author Temiz, Mustafa
dc.contributor.author Inal, Yasin
dc.contributor.author Cicekyurt, Emre
dc.contributor.author Yousef, Malik
dc.contributor.authorID 0000-0002-2272-6270 en_US
dc.contributor.authorID 0000-0002-2839-1424 en_US
dc.contributor.authorID 0009-0002-4373-8526 en_US
dc.contributor.department AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
dc.contributor.institutionauthor Bakir-Gungor, Burcu
dc.contributor.institutionauthor Temiz, Mustafa
dc.contributor.institutionauthor Inal, Yasin
dc.contributor.institutionauthor Cicekyurt, Emre
dc.date.accessioned 2024-12-04T07:15:17Z
dc.date.available 2024-12-04T07:15:17Z
dc.date.issued 2024 en_US
dc.description.abstract Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2′ 3′ cyclic 3′ phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED. en_US
dc.identifier.endpage 14 en_US
dc.identifier.issn 0010-4825
dc.identifier.startpage 1 en_US
dc.identifier.uri https://doi.org/10.1016/j.compbiomed.2024.109098
dc.identifier.uri https://hdl.handle.net/20.500.12573/2397
dc.identifier.volume 182 en_US
dc.language.iso eng en_US
dc.publisher ELSEVIER en_US
dc.relation.isversionof 10.1016/j.compbiomed.2024.109098 en_US
dc.relation.journal Computers in Biology and Medicine en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Biomarkers en_US
dc.subject Colorectal cancer en_US
dc.subject Enzyme en_US
dc.subject Machine learning en_US
dc.subject Metagenomic en_US
dc.subject Microbiome en_US
dc.subject Species en_US
dc.title CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques en_US
dc.type article en_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
1-s2.0-S0010482524011831-main.pdf
Size:
6.3 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: