Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Different Feature Selection Methods
Loading...
Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
PeerJ Inc
Open Access Color
GOLD
Green Open Access
Yes
OpenAIRE Downloads
93
OpenAIRE Views
131
Publicly Funded
No
Abstract
The tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.
Description
Hacilar, Hilal/0000-0002-5811-6722
ORCID
Keywords
Feature Selection, Human Gut Microbiome, Biomarker Discovery, Classification, Metagenomics, QH301-705.5, Bioinformatics, R, Human gut microbiome, Classification, Inflammatory Bowel Diseases, Gastrointestinal Microbiome, Crohn Disease, Feature selection, Medicine, Humans, Colitis, Ulcerative, Metagenomics, Biomarker discovery, Biology (General), Biomarkers
Fields of Science
0301 basic medicine, 0303 health sciences, 03 medical and health sciences
Citation
WoS Q
Q2
Scopus Q
Q3

OpenCitations Citation Count
32
Source
PeerJ
Volume
10
Issue
Start Page
e13205
End Page
PlumX Metrics
Citations
CrossRef : 23
Scopus : 39
PubMed : 22
Captures
Mendeley Readers : 101
SCOPUS™ Citations
40
checked on Mar 04, 2026
Web of Science™ Citations
34
checked on Mar 04, 2026
Page Views
1
checked on Mar 04, 2026
Downloads
5
checked on Mar 04, 2026
Google Scholar™

OpenAlex FWCI
3.7743
Sustainable Development Goals
3
GOOD HEALTH AND WELL-BEING


