Prediction of Colorectal Cancer Based on Taxonomic Levels of Microorganisms and Discovery of Taxonomic Biomarkers Using the Grouping-Scoring (G-S-M) Approach

Loading...

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Colorectal cancer (CRC) is one of the most prevalent forms of cancer globally. The human gut microbiome plays an important role in the development of CRC and serves as a biomarker for early detection and treatment. This research effort focuses on the identification of potential taxonomic biomarkers of CRC using a grouping-based feature selection method. Additionally, this study investigates the effect of incorporating biological domain knowledge into the feature selection process while identifying CRC-associated microorganisms. Conventional feature selection techniques often fail to leverage existing biological knowledge during metagenomic data analysis. To address this gap, we propose taxonomy-based Grouping Scoring Modeling (G-S-M) method that integrates biological domain knowledge into feature grouping and selection. In this study, using metagenomic data related to CRC, classification is performed at three taxonomic levels (genus, family and order). The MetaPhlAn tool is employed to determine the relative abundance values of species in each sample. Comparative performance analyses involve six feature selection methods and four classification algorithms. When experimented on two CRC associated metagenomics datasets, the highest performance metric, yielding an AUC of 0.90, is observed at the genus taxonomic level. At this level, 7 out of top 10 groups (Parvimonas, Peptostreptococcus, Fusobacterium, Gemella, Streptococcus, Porphyromonas and Solobacterium) were commonly identified for both datasets. Moreover, the identified microorganisms at genus, family, and order levels are thoroughly discussed via refering to CRC-related metagenomic literature. This study not only contributes to our understanding of CRC development, but also highlights the applicability of taxonomy-based G-S-M method in tackling various diseases. © 2025 Elsevier B.V., All rights reserved.

Description

Keywords

Colorectal Cancer, Human Gut Microbiome, Machine Learning, Microbiomegsm, Taxonomic Biomarkers, Biomarkers, Tumor, Lung Cancer, Colorectal Cancer, Features Selection, Human Gut Microbiome, Human Guts, Machine-Learning, Metagenomics, Microbiome, Microbiomegsm, Scoring Models, Taxonomic Biomarker, Article, Colorectal Cancer, Desulfovibrionales, DNA Damage, Feature Selection, Fusobacteriaceae, Fusobacterium, Gemella, Human, Intestine Flora, Lachnospiraceae, Lactobacillales, Machine Learning, Metagenomics, Microbial Community, Nonhuman, Parvimonas, Peptostreptococcaceae, Peptostreptococcus, Performance Indicator, Porphyromonas, Prediction, Solobacterium, Streptococcaceae, Streptococcus, Streptococcus Equinus, Taxonomy, Algorithm, Bacterium, Classification, Colorectal Tumor, Genetics, Microbiology, Procedures, Tumor Marker, Algorithms, Bacteria, Biomarkers, Tumor, Colorectal Neoplasms, Gastrointestinal Microbiome, Humans, Bacteria, Biomarkers, Tumor, Humans, Metagenomics, Colorectal Neoplasms, Algorithms, Gastrointestinal Microbiome

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Logo
OpenCitations Citation Count
N/A

Volume

187

Issue

Start Page

109813

End Page

PlumX Metrics
Citations

Scopus : 2

Captures

Mendeley Readers : 13

SCOPUS™ Citations

2

checked on Jun 05, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.69

Sustainable Development Goals

GOOD HEALTH AND WELL-BEING3
GOOD HEALTH AND WELL-BEING