Identifying Taxonomic Biomarkers of Colorectal Cancer in Human Intestinal Microbiota Using Multiple Feature Selection Methods

Loading...
Publication Logo

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

A variety of bacterial species called gut microbiota work together to maintain a steady intestinal environment. The gastrointestinal tract contains tremendous amount of different species including archaea, bacteria, fungi, and viruses. While these organisms are crucial immune system stabilizers, the dysbiosis of the intestinal flora has been related to gastrointestinal disorders including Colorectal cancer (CRC), intestinal cancer, irritable bowel syndrome and inflammatory bowel disease. In the last decade, next-generation sequencing (NGS) methods have accelerated the identification of human gut flora. CRC is a deathly condition that has been on the rise in the last century, affecting half a million people each year. Since early CRC diagnosis is critical for an effective treatment, there is an immediate requirement for a classification system that can expedite CRC diagnosis. In this study, via analyzing the available metagenomics data on CRC, we aim to facilitate the CRC diagnosis via finding biomarkers linked with CRC, and via building a classification model. We have obtained the metagenomic sequencing data of the healthy individuals and CRC patients from a metagenome-wide association analysis and we have classified this data according to the disease stages. Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), Extreme Gradient Boosting (XGBoost), min redundancy max relevance (mRMR), Information Gain (IG) and Select K Best (SKB) feature selection algorithms were utilized to cope with the complexity of the features. We observed that the SKB, IG, and XGBoost techniques made significant contributions to decrease the microbiota in use for CRC diagnosis, thereby reducing cost and time. We realized that our Random Forest classifier outperformed Adaboost, Support Vector Machine, Decision Tree, Logitboost and stacking ensemble classifiers in terms of CRC classification performance. Our results reiterated some known and some potential microbiome associated mechanisms in CRC, which could aid the design of new diagnostics based on the microbiome. © 2022 Elsevier B.V., All rights reserved.

Description

Keywords

Biomarker Discovery, Classification, Feature Selection, Human Gut Microbiome, Metagenomics, Adaptive Boosting, Bacteria, Biomarkers, Computer Aided Diagnosis, Decision Trees, Diseases, Feature Selection, Support Vector Machines, Viruses, Bio-Marker Discovery, Colorectal Cancer Diagnosis, Features Selection, Human Gut Microbiome, Human Guts, Information Gain, Metagenomics, Microbiome, Microbiotas, Multiple Features, Classification (Of Information)

Fields of Science

0301 basic medicine, 0303 health sciences, 03 medical and health sciences

Citation

WoS Q

N/A

Scopus Q

N/A
OpenCitations Logo
OpenCitations Citation Count
3

Source

-- 2022 Innovations in Intelligent Systems and Applications Conference, ASYU 2022 -- Antalya; Akdeniz University -- 183936

Volume

Issue

Start Page

1

End Page

6
PlumX Metrics
Citations

Scopus : 5

Captures

Mendeley Readers : 3

SCOPUS™ Citations

5

checked on Mar 06, 2026

Page Views

1

checked on Mar 06, 2026

Downloads

5

checked on Mar 06, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
2.2054

Sustainable Development Goals

3

GOOD HEALTH AND WELL-BEING
GOOD HEALTH AND WELL-BEING Logo