microBiomeGSM: the identification of taxonomic biomarkers from metagenomic data using grouping, scoring and modeling (G-S-M) approach

dc.contributor.author Bakir-Gungor, Burcu
dc.contributor.author Temiz, Mustafa
dc.contributor.author Jabeer, Amhar
dc.contributor.author Wu, Di
dc.contributor.author Yousef, Malik
dc.contributor.authorID 0000-0002-2272-6270 en_US
dc.contributor.authorID 0000-0002-2839-1424 en_US
dc.contributor.department AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
dc.contributor.institutionauthor Bakir-Gungor, Burcu
dc.contributor.institutionauthor Temiz, Mustafa
dc.contributor.institutionauthor Jabeer, Amhar
dc.date.accessioned 2024-01-12T08:00:47Z
dc.date.available 2024-01-12T08:00:47Z
dc.date.issued 2023 en_US
dc.description.abstract Numerous biological environments have been characterized with the advent of metagenomic sequencing using next generation sequencing which lays out the relative abundance values of microbial taxa. Modeling the human microbiome using machine learning models has the potential to identify microbial biomarkers and aid in the diagnosis of a variety of diseases such as inflammatory bowel disease, diabetes, colorectal cancer, and many others. The goal of this study is to develop an effective classification model for the analysis of metagenomic datasets associated with different diseases. In this way, we aim to identify taxonomic biomarkers associated with these diseases and facilitate disease diagnosis. The microBiomeGSM tool presented in this work incorporates the pre-existing taxonomy information into a machine learning approach and challenges to solve the classification problem in metagenomics disease-associated datasets. Based on the G-S-M (Grouping-Scoring-Modeling) approach, species level information is used as features and classified by relating their taxonomic features at different levels, including genus, family, and order. Using four different disease associated metagenomics datasets, the performance of microBiomeGSM is comparatively evaluated with other feature selection methods such as Fast Correlation Based Filter (FCBF), Select K Best (SKB), Extreme Gradient Boosting (XGB), Conditional Mutual Information Maximization (CMIM), Maximum Likelihood and Minimum Redundancy (MRMR) and Information Gain (IG), also with other classifiers such as AdaBoost, Decision Tree, LogitBoost and Random Forest. microBiomeGSM achieved the highest results with an Area under the curve (AUC) value of 0.98% at the order taxonomic level for IBDMD dataset. Another significant output of microBiomeGSM is the list of taxonomic groups that are identified as important for the disease under study and the names of the species within these groups. The association between the detected species and the disease under investigation is confirmed by previous studies in the literature. The microBiomeGSM tool and other supplementary files are publicly available at: https://github.com/malikyousef/microBiomeGSM. en_US
dc.description.sponsorship We extend our gratitude to COST ML4Microbiome Action for the funding, which has played a pivotal role in advancing microbiome research and facilitating the expansion of these research endeavors. This research was made possible by the generous support of the L’Oréal-UNESCO Young Women Scientist Program. BB-G would like to express her gratitude for the L’Oréal-UNESCO Young Women Scientist Award, received in 2022. en_US
dc.identifier.endpage 18 en_US
dc.identifier.issn 1664-302X
dc.identifier.other WOS:001117805800001
dc.identifier.startpage 1 en_US
dc.identifier.uri https://doi.org/10.3389/fmicb.2023.1264941
dc.identifier.uri https://hdl.handle.net/20.500.12573/1891
dc.identifier.volume 14 en_US
dc.language.iso eng en_US
dc.publisher FRONTIERS MEDIA SA en_US
dc.relation.isversionof 10.3389/fmicb.2023.1264941 en_US
dc.relation.journal FRONTIERS IN MICROBIOLOGY en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject gut microbiome en_US
dc.subject metagenomics en_US
dc.subject type 2 diabetes en_US
dc.subject inflammatory bowel disease en_US
dc.subject colorectal cancer en_US
dc.subject machine learning en_US
dc.subject classification en_US
dc.subject feature selection en_US
dc.title microBiomeGSM: the identification of taxonomic biomarkers from metagenomic data using grouping, scoring and modeling (G-S-M) approach en_US
dc.type article en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fmicb-14-1264941.pdf
Size:
10.09 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: