CCPred: Global and Population-Specific Colorectal Cancer Prediction and Metagenomic Biomarker Identification at Different Molecular Levels Using Machine Learning Techniques

dc.contributor.author Bakir-Güngör, Burcu
dc.contributor.author Temiz, Mustafa
dc.contributor.author Inal, Yasin
dc.contributor.author Cicekyurt, Emre
dc.contributor.author Yousef, Malik
dc.date.accessioned 2025-09-25T10:42:03Z
dc.date.available 2025-09-25T10:42:03Z
dc.date.issued 2024
dc.description.abstract Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2′ 3′ cyclic 3′ phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED. © 2024 Elsevier B.V., All rights reserved. en_US
dc.identifier.doi 10.1016/j.compbiomed.2024.109098
dc.identifier.issn 1879-0534
dc.identifier.issn 0010-4825
dc.identifier.scopus 2-s2.0-85203806584
dc.identifier.uri https://doi.org/10.1016/j.compbiomed.2024.109098
dc.identifier.uri https://hdl.handle.net/20.500.12573/3405
dc.language.iso en en_US
dc.publisher Elsevier Ltd en_US
dc.relation.ispartof Computers in Biology and Medicine en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Biomarkers en_US
dc.subject Colorectal Cancer en_US
dc.subject Enzyme en_US
dc.subject Machine Learning en_US
dc.subject Metagenomic en_US
dc.subject Microbiome en_US
dc.subject Pathway en_US
dc.subject Species en_US
dc.subject Phosphodiesterase en_US
dc.subject Pyruvic Acid en_US
dc.subject Biomarkers, Tumor en_US
dc.subject Adversarial Machine Learning en_US
dc.subject Lung Cancer en_US
dc.subject Biomarker Identification en_US
dc.subject Cancer Prediction en_US
dc.subject Colorectal Cancer en_US
dc.subject Machine Learning Techniques en_US
dc.subject Machine-Learning en_US
dc.subject Metagenomics en_US
dc.subject Microbiome en_US
dc.subject Molecular Levels en_US
dc.subject Pathway en_US
dc.subject Species en_US
dc.subject Plant Diseases en_US
dc.subject Phosphodiesterase en_US
dc.subject Pyruvic Acid en_US
dc.subject Rna 2',3' Cyclic 3' Phosphodiesterase en_US
dc.subject Tumor Marker en_US
dc.subject Unclassified Drug en_US
dc.subject Adult en_US
dc.subject Aged en_US
dc.subject Anaerobic Bacterium en_US
dc.subject Area Under The Curve en_US
dc.subject Article en_US
dc.subject Classification Algorithm en_US
dc.subject Colorectal Cancer en_US
dc.subject Computer Model en_US
dc.subject Controlled Study en_US
dc.subject Decision Tree en_US
dc.subject Diagnostic Accuracy en_US
dc.subject Feature Selection Algorithm en_US
dc.subject Female en_US
dc.subject Human en_US
dc.subject Intestine Flora en_US
dc.subject Leave One Out Cross Validation en_US
dc.subject Machine Learning en_US
dc.subject Major Clinical Study en_US
dc.subject Male en_US
dc.subject Metagenomics en_US
dc.subject Monte Carlo Cross Validation en_US
dc.subject Population Research en_US
dc.subject Prediction en_US
dc.subject Random Forest en_US
dc.subject Ruthenibacterium Lactatiformanas en_US
dc.subject Sensitivity and Specificity en_US
dc.subject Colorectal Tumor en_US
dc.subject Genetics en_US
dc.subject Metabolism en_US
dc.subject Metagenome en_US
dc.subject Microbiology en_US
dc.subject Procedures en_US
dc.subject Software en_US
dc.subject Biomarkers, Tumor en_US
dc.subject Colorectal Neoplasms en_US
dc.subject Gastrointestinal Microbiome en_US
dc.subject Humans en_US
dc.subject Machine Learning en_US
dc.subject Metagenome en_US
dc.subject Software en_US
dc.title CCPred: Global and Population-Specific Colorectal Cancer Prediction and Metagenomic Biomarker Identification at Different Molecular Levels Using Machine Learning Techniques en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.scopusid 25932029800
gdc.author.scopusid 57219794472
gdc.author.scopusid 59169834900
gdc.author.scopusid 59325097800
gdc.author.scopusid 14029389000
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Bakir-Güngör] Burcu, Department of Computer Engineering, Abdullah Gül Üniversitesi, Kayseri, Turkey; [Temiz] Mustafa, Department of Electrical & Computer Engineering, Abdullah Gül Üniversitesi, Kayseri, Turkey; [Inal] Yasin, Department of Computer Engineering, Abdullah Gül Üniversitesi, Kayseri, Turkey; [Cicekyurt] Emre, Department of Computer Engineering, Abdullah Gül Üniversitesi, Kayseri, Turkey; [Yousef] Malik, Department of Information Systems, Zefat Academic College, Safad, Israel, Zefat Academic College, Safad, Israel en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.volume 182 en_US
gdc.description.wosquality Q1
gdc.identifier.openalex W4402574029
gdc.identifier.pmid 39293338
gdc.index.type Scopus
gdc.index.type PubMed
gdc.oaire.diamondjournal false
gdc.oaire.impulse 5.0
gdc.oaire.influence 2.6174058E-9
gdc.oaire.isgreen false
gdc.oaire.keywords Machine Learning
gdc.oaire.keywords Biomarkers, Tumor
gdc.oaire.keywords Humans
gdc.oaire.keywords Metagenome
gdc.oaire.keywords Metagenomics
gdc.oaire.keywords Colorectal Neoplasms
gdc.oaire.keywords Software
gdc.oaire.keywords Gastrointestinal Microbiome
gdc.oaire.popularity 5.356467E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration International
gdc.openalex.fwci 1.2503
gdc.openalex.normalizedpercentile 0.79
gdc.opencitations.count 3
gdc.plumx.mendeley 14
gdc.plumx.scopuscites 4
gdc.scopus.citedcount 4
gdc.virtual.author Güngör, Burcu
relation.isAuthorOfPublication e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isAuthorOfPublication.latestForDiscovery e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 52f507ab-f278-4a1f-824c-44da2a86bd51
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1-s2.0-S0010482524011831-main.pdf
Size:
6.3 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: