Integrating Biological Domain Knowledge With Machine Learning for Identifying Colorectal-Cancer Microbial Enzymes in Metagenomic Data

dc.contributor.author Bakir-Gungor, Burcu
dc.contributor.author Ersoz, Nur Sebnem
dc.contributor.author Yousef, Malik
dc.date.accessioned 2025-09-25T10:49:02Z
dc.date.available 2025-09-25T10:49:02Z
dc.date.issued 2025
dc.description.abstract Advances in metagenomics have revolutionized our ability to elucidate links between the microbiome and human diseases. Colorectal cancer (CRC), a leading cause of cancer-related mortality worldwide, has been associated with dysbiosis of the gut microbiome. This study aims to develop a method for identifying CRC-associated microbial enzymes by incorporating biological domain knowledge into the feature selection process. Conventional feature selection techniques often evaluate features individually and fail to leverage biological knowledge during metagenomic data analysis. To address this gap, we propose the enzyme commission (EC)-nomenclature-based Grouping-Scoring-Modeling (G-S-M) method, which integrates biological domain knowledge into feature grouping and selection. The proposed method was tested on a CRC-associated metagenomic dataset collected from eight different countries. Community-level relative abundance values of enzymes were considered as features and grouped based on their EC categories to provide biologically informed groupings. Our findings in randomized 10-fold cross-validation experiments imply that glycosidases, CoA-transferases, hydro-lyases, oligo-1,6-glucosidase, crotonobetainyl-CoA hydratase, and citrate CoA-transferase enzymes can be associated with CRC development as part of different molecular pathways. These enzymes are mostly synthesized by Eschericia coli, Salmonella enterica, Klebsiella pneumoniae, Staphylococcus aureus, Streptococcus pneumoniae, and Clostridioides dificile. Comparative evaluation experiments showed that the proposed model consistently outperforms traditional feature selection methods paired with various classifiers. en_US
dc.description.sponsorship The Scientific and Technological Research Council of Turkiye (TUBITAK) 2211A BIDEP program; Abdullah Gul University Support Foundation (AGUV); L'Oreal-UNESCO Young Women Scientist Award; L'Oreal-UNESCO Young Women Scientist Program; Zefat Academic College en_US
dc.description.sponsorship We would like to thank The Scientific and Technological Research Council of Turkiye (TUBITAK) 2211A BIDEP program for supporting the work of N.S.E. The work of B.B.-G. has also been supported by the Abdullah Gul University Support Foundation (AGUV). B.B.-G. would like to express her gratitude for the L'Oreal-UNESCO Young Women Scientist Award. This research was made possible by the support of the L'Oreal-UNESCO Young Women Scientist Program. The work of M.Y. has been supported by Zefat Academic College. en_US
dc.description.sponsorship Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, TÜBİTAK; Abdullah Gul University Support Foundation; Zefat Academic College
dc.identifier.doi 10.3390/app15062940
dc.identifier.issn 2076-3417
dc.identifier.scopus 2-s2.0-105000847998
dc.identifier.uri https://doi.org/10.3390/app15062940
dc.identifier.uri https://hdl.handle.net/20.500.12573/4025
dc.language.iso en en_US
dc.publisher MDPI en_US
dc.relation.ispartof Applied Sciences-Basel en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Metagenomic Analysis of Colorectal Cancer en_US
dc.subject Machine Learning en_US
dc.subject Feature Grouping en_US
dc.subject Functional Profiling of Metagenomes en_US
dc.subject Community-Level Enzyme Commission (Ec) Abundances en_US
dc.title Integrating Biological Domain Knowledge With Machine Learning for Identifying Colorectal-Cancer Microbial Enzymes in Metagenomic Data en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Bakir-Gungor, Burcu/0000-0002-2272-6270
gdc.author.id ERSÖZ, NUR ŞEBNEM/0000-0003-3343-9936
gdc.author.id Yousef, Malik/0000-0001-8780-6303
gdc.author.scopusid 25932029800
gdc.author.scopusid 57423006700
gdc.author.scopusid 14029389000
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Bakir-Gungor, Burcu] Abdullah Gul Univ, Dept Comp Engn, Fac Engn, TR-38080 Kayseri, Turkiye; [Ersoz, Nur Sebnem] Abdullah Gul Univ, Grad Sch Engn & Sci, Bioengn Dept, TR-38080 Kayseri, Turkiye; [Yousef, Malik] Zefat Acad Coll, Dept Informat Syst, IL-1320611 Safed, Israel; [Yousef, Malik] Zefat Acad Coll, Galilee Digital Hlth Res Ctr GDH, IL-1320611 Safed, Israel en_US
gdc.description.issue 6 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.startpage 2940
gdc.description.volume 15 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q2
gdc.identifier.openalex W4408278013
gdc.identifier.wos WOS:001453565100001
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.impulse 1.0
gdc.oaire.influence 2.5587505E-9
gdc.oaire.isgreen true
gdc.oaire.keywords Technology
gdc.oaire.keywords QH301-705.5
gdc.oaire.keywords T
gdc.oaire.keywords Physics
gdc.oaire.keywords QC1-999
gdc.oaire.keywords Engineering (General). Civil engineering (General)
gdc.oaire.keywords metagenomic analysis of colorectal cancer
gdc.oaire.keywords Chemistry
gdc.oaire.keywords machine learning
gdc.oaire.keywords functional profiling of metagenomes
gdc.oaire.keywords feature grouping
gdc.oaire.keywords TA1-2040
gdc.oaire.keywords Biology (General)
gdc.oaire.keywords QD1-999
gdc.oaire.keywords community-level enzyme commission (EC) abundances
gdc.oaire.keywords Feature grouping
gdc.oaire.keywords Community-level enzyme commission (EC) abundances
gdc.oaire.keywords Metagenomic analysis of colorectal cancer
gdc.oaire.keywords Machine learning
gdc.oaire.keywords Functional proffiling of metagenomes
gdc.oaire.popularity 3.512786E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration International
gdc.openalex.fwci 0.8432
gdc.openalex.normalizedpercentile 0.7
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 1
gdc.plumx.mendeley 5
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Güngör, Burcu
gdc.wos.citedcount 0
relation.isAuthorOfPublication e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isAuthorOfPublication.latestForDiscovery e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 52f507ab-f278-4a1f-824c-44da2a86bd51
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
applsci-15-02940-v2.pdf
Size:
13.25 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: