MicroBiomeGSM: The Identification of Taxonomic Biomarkers From Metagenomic Data Using Grouping, Scoring and Modeling (G-S-M) Approach

dc.contributor.author Bakir-Gungor, Burcu
dc.contributor.author Temiz, Mustafa
dc.contributor.author Jabeer, Amhar
dc.contributor.author Wu, Di
dc.contributor.author Yousef, Malik
dc.date.accessioned 2025-09-25T10:50:43Z
dc.date.available 2025-09-25T10:50:43Z
dc.date.issued 2023
dc.description Temiz, Mustafa/0000-0002-2839-1424; en_US
dc.description.abstract Numerous biological environments have been characterized with the advent of metagenomic sequencing using next generation sequencing which lays out the relative abundance values of microbial taxa. Modeling the human microbiome using machine learning models has the potential to identify microbial biomarkers and aid in the diagnosis of a variety of diseases such as inflammatory bowel disease, diabetes, colorectal cancer, and many others. The goal of this study is to develop an effective classification model for the analysis of metagenomic datasets associated with different diseases. In this way, we aim to identify taxonomic biomarkers associated with these diseases and facilitate disease diagnosis. The microBiomeGSM tool presented in this work incorporates the pre-existing taxonomy information into a machine learning approach and challenges to solve the classification problem in metagenomics disease-associated datasets. Based on the G-S-M (Grouping-Scoring-Modeling) approach, species level information is used as features and classified by relating their taxonomic features at different levels, including genus, family, and order. Using four different disease associated metagenomics datasets, the performance of microBiomeGSM is comparatively evaluated with other feature selection methods such as Fast Correlation Based Filter (FCBF), Select K Best (SKB), Extreme Gradient Boosting (XGB), Conditional Mutual Information Maximization (CMIM), Maximum Likelihood and Minimum Redundancy (MRMR) and Information Gain (IG), also with other classifiers such as AdaBoost, Decision Tree, LogitBoost and Random Forest. microBiomeGSM achieved the highest results with an Area under the curve (AUC) value of 0.98% at the order taxonomic level for IBDMD dataset. Another significant output of microBiomeGSM is the list of taxonomic groups that are identified as important for the disease under study and the names of the species within these groups. The association between the detected species and the disease under investigation is confirmed by previous studies in the literature. The microBiomeGSM tool and other supplementary files are publicly available at: https://github.com/malikyousef/microBiomeGSM. en_US
dc.description.sponsorship L'Oral-UNESCO Young Women Scientist Program; Abdullah Gul University Support Foundation (AGUV); Zefat Academic College [CA18131]; COST (European Cooperation in Science and Technology) en_US
dc.description.sponsorship The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The work of BB-G has been supported by the L'Oreal-UNESCO Young Women Scientist Program and by the Abdullah Gul University Support Foundation (AGUV). The work of MY has been supported by the Zefat Academic College. This article is based upon work from COST Action ML4Microbiome (CA18131), supported by COST (European Cooperation in Science and Technology), www.cost.eu, which has played a pivotal role in advancing microbiome research and facilitating the expansion of these research endeavours.r The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The work of BB-G has been supported by the L'Oreal-UNESCO Young Women Scientist Program and by the Abdullah Gul University Support Foundation (AGUV). The work of MY has been supported by the Zefat Academic College. This article is based upon work from COST Action ML4Microbiome (CA18131), supported by COST (European Cooperation in Science and Technology), www.cost.eu, which has played a pivotal role in advancing microbiome research and facilitating the expansion of these research endeavours. en_US
dc.description.sponsorship Abdullah Gul University Support Foundation; Zefat Academic College; European Cooperation in Science and Technology, COST, (CA18131); European Cooperation in Science and Technology, COST
dc.description.sponsorship The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The work of BB-G has been supported by the L’Oréal-UNESCO Young Women Scientist Program and by the Abdullah Gul University Support Foundation (AGUV). The work of MY has been supported by the Zefat Academic College. This article is based upon work from COST Action ML4Microbiome (CA18131), supported by COST (European Cooperation in Science and Technology), www.cost.eu, which has played a pivotal role in advancing microbiome research and facilitating the expansion of these research endeavours.
dc.identifier.doi 10.3389/fmicb.2023.1264941
dc.identifier.issn 1664-302X
dc.identifier.scopus 2-s2.0-85178880745
dc.identifier.uri https://doi.org/10.3389/fmicb.2023.1264941
dc.identifier.uri https://hdl.handle.net/20.500.12573/4196
dc.language.iso en en_US
dc.publisher Frontiers Media S.A. en_US
dc.relation.ispartof Frontiers in Microbiology en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Gut Microbiome en_US
dc.subject Metagenomics en_US
dc.subject Type 2 Diabetes en_US
dc.subject Inflammatory Bowel Disease en_US
dc.subject Colorectal Cancer en_US
dc.subject Machine Learning en_US
dc.subject Classification en_US
dc.subject Feature Selection en_US
dc.title MicroBiomeGSM: The Identification of Taxonomic Biomarkers From Metagenomic Data Using Grouping, Scoring and Modeling (G-S-M) Approach en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Temiz, Mustafa/0000-0002-2839-1424
gdc.author.scopusid 25932029800
gdc.author.scopusid 57219794472
gdc.author.scopusid 57221663697
gdc.author.scopusid 56669013600
gdc.author.scopusid 14029389000
gdc.author.wosid Temiz, Mustafa/Kzu-4768-2024
gdc.author.wosid Wu, Di/Hnp-3772-2023
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Bakir-Gungor, Burcu; Jabeer, Amhar] Abdullah Gul Univ, Fac Engn, Dept Comp Engn, Kayseri, Turkiye; [Temiz, Mustafa] Abdullah Gul Univ, Fac Engn, Dept Elect & Comp Engn, Kayseri, Turkiye; [Wu, Di] Univ North Carolina Chapel Hill, Dept Biostat, Chapel Hill, NC USA; [Wu, Di] Univ North Carolina Chapel Hill, Adams Sch Dent, Div Oral & Craniofacial Hlth Sci, Chapel Hill, NC USA; [Yousef, Malik] Zefat Acad Coll, Dept Informat Syst, Safed, Israel; [Yousef, Malik] Zefat Acad Coll, Galilee Digital Hlth Res Ctr GDH, Safed, Israel en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.volume 14 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q1
gdc.identifier.openalex W4388940431
gdc.identifier.pmid 38075911
gdc.identifier.wos WOS:001117805800001
gdc.index.type WoS
gdc.index.type Scopus
gdc.index.type PubMed
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.downloads 80
gdc.oaire.impulse 11.0
gdc.oaire.influence 2.8552405E-9
gdc.oaire.isgreen true
gdc.oaire.keywords metagenomics
gdc.oaire.keywords feature selection
gdc.oaire.keywords machine learning
gdc.oaire.keywords classification
gdc.oaire.keywords inflammatory bowel disease
gdc.oaire.keywords gut microbiome
gdc.oaire.keywords colorectal cancer
gdc.oaire.keywords type 2 diabetes
gdc.oaire.keywords Microbiology
gdc.oaire.keywords QR1-502
gdc.oaire.popularity 1.030178E-8
gdc.oaire.publicfunded false
gdc.oaire.views 115
gdc.openalex.collaboration International
gdc.openalex.fwci 1.7495
gdc.openalex.normalizedpercentile 0.85
gdc.opencitations.count 9
gdc.plumx.mendeley 23
gdc.plumx.pubmedcites 1
gdc.plumx.scopuscites 13
gdc.scopus.citedcount 13
gdc.virtual.author Güngör, Burcu
gdc.wos.citedcount 9
relation.isAuthorOfPublication e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isAuthorOfPublication.latestForDiscovery e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 52f507ab-f278-4a1f-824c-44da2a86bd51
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fmicb-14-1264941.pdf
Size:
10.09 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: