CCPred: Global and Population-Specific Colorectal Cancer Prediction and Metagenomic Biomarker Identification at Different Molecular Levels Using Machine Learning Techniques
Loading...
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier Ltd
Open Access Color
Green Open Access
No
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2′ 3′ cyclic 3′ phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED. © 2024 Elsevier B.V., All rights reserved.
Description
Keywords
Biomarkers, Colorectal Cancer, Enzyme, Machine Learning, Metagenomic, Microbiome, Pathway, Species, Phosphodiesterase, Pyruvic Acid, Biomarkers, Tumor, Adversarial Machine Learning, Lung Cancer, Biomarker Identification, Cancer Prediction, Colorectal Cancer, Machine Learning Techniques, Machine-Learning, Metagenomics, Microbiome, Molecular Levels, Pathway, Species, Plant Diseases, Phosphodiesterase, Pyruvic Acid, Rna 2',3' Cyclic 3' Phosphodiesterase, Tumor Marker, Unclassified Drug, Adult, Aged, Anaerobic Bacterium, Area Under The Curve, Article, Classification Algorithm, Colorectal Cancer, Computer Model, Controlled Study, Decision Tree, Diagnostic Accuracy, Feature Selection Algorithm, Female, Human, Intestine Flora, Leave One Out Cross Validation, Machine Learning, Major Clinical Study, Male, Metagenomics, Monte Carlo Cross Validation, Population Research, Prediction, Random Forest, Ruthenibacterium Lactatiformanas, Sensitivity and Specificity, Colorectal Tumor, Genetics, Metabolism, Metagenome, Microbiology, Procedures, Software, Biomarkers, Tumor, Colorectal Neoplasms, Gastrointestinal Microbiome, Humans, Machine Learning, Metagenome, Software, Machine Learning, Biomarkers, Tumor, Humans, Metagenome, Metagenomics, Colorectal Neoplasms, Software, Gastrointestinal Microbiome
Fields of Science
Citation
WoS Q
Q1
Scopus Q
Q1

OpenCitations Citation Count
3
Source
Computers in Biology and Medicine
Volume
182
Issue
Start Page
End Page
PlumX Metrics
Citations
Scopus : 4
Captures
Mendeley Readers : 14
SCOPUS™ Citations
4
checked on Mar 06, 2026
Page Views
1
checked on Mar 06, 2026
Downloads
9
checked on Mar 06, 2026
Google Scholar™

OpenAlex FWCI
1.2503
Sustainable Development Goals
3
GOOD HEALTH AND WELL-BEING


