Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Different Feature Selection Methods

Loading...
Publication Logo

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

PeerJ Inc

Open Access Color

GOLD

Green Open Access

Yes

OpenAIRE Downloads

93

OpenAIRE Views

131

Publicly Funded

No
Impulse
Top 1%
Influence
Top 10%
Popularity
Top 10%

Research Projects

Journal Issue

Abstract

The tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.

Description

Hacilar, Hilal/0000-0002-5811-6722

Keywords

Feature Selection, Human Gut Microbiome, Biomarker Discovery, Classification, Metagenomics, QH301-705.5, Bioinformatics, R, Human gut microbiome, Classification, Inflammatory Bowel Diseases, Gastrointestinal Microbiome, Crohn Disease, Feature selection, Medicine, Humans, Colitis, Ulcerative, Metagenomics, Biomarker discovery, Biology (General), Biomarkers

Fields of Science

0301 basic medicine, 0303 health sciences, 03 medical and health sciences

Citation

WoS Q

Q2

Scopus Q

Q3
OpenCitations Logo
OpenCitations Citation Count
32

Source

PeerJ

Volume

10

Issue

Start Page

e13205

End Page

PlumX Metrics
Citations

CrossRef : 23

Scopus : 39

PubMed : 22

Captures

Mendeley Readers : 101

SCOPUS™ Citations

40

checked on Mar 04, 2026

Web of Science™ Citations

34

checked on Mar 04, 2026

Page Views

1

checked on Mar 04, 2026

Downloads

5

checked on Mar 04, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
3.7743
Altmetrics Badge

Sustainable Development Goals

3

GOOD HEALTH AND WELL-BEING
GOOD HEALTH AND WELL-BEING Logo