Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data

dc.contributor.author Bakir-Gungor, Burcu
dc.contributor.author Ersoz, Nur Sebnem
dc.contributor.author Yousef, Malik
dc.contributor.authorID 0000-0002-2272-6270 en_US
dc.contributor.department AGÜ, Yaşam ve Doğa Bilimleri Fakültesi, Moleküler Biyoloji ve Genetik Bölümü en_US
dc.contributor.institutionauthor Bakir-Gungor, Burcu
dc.contributor.institutionauthor Ersoz, Nur Sebnem
dc.date.accessioned 2025-06-17T08:27:12Z
dc.date.available 2025-06-17T08:27:12Z
dc.date.issued 2025 en_US
dc.description.abstract Advances in metagenomics have revolutionized our ability to elucidate links between the microbiome and human diseases. Colorectal cancer (CRC), a leading cause of cancer-related mortality worldwide, has been associated with dysbiosis of the gut microbiome. This study aims to develop a method for identifying CRC-associated microbial enzymes by incorporating biological domain knowledge into the feature selection process. Conventional feature selection techniques often evaluate features individually and fail to leverage biological knowledge during metagenomic data analysis. To address this gap, we propose the enzyme commission (EC)-nomenclature-based Grouping-Scoring-Modeling (G-S-M) method, which integrates biological domain knowledge into feature grouping and selection. The proposed method was tested on a CRC-associated metagenomic dataset collected from eight different countries. Community-level relative abundance values of enzymes were considered as features and grouped based on their EC categories to provide biologically informed groupings. Our findings in randomized 10-fold cross-validation experiments imply that glycosidases, CoA-transferases, hydro-lyases, oligo-1,6-glucosidase, crotonobetainyl-CoA hydratase, and citrate CoA-transferase enzymes can be associated with CRC development as part of different molecular pathways. These enzymes are mostly synthesized by Eschericia coli, Salmonella enterica, Klebsiella pneumoniae, Staphylococcus aureus, Streptococcus pneumoniae, and Clostridioides dificile. Comparative evaluation experiments showed that the proposed model consistently outperforms traditional feature selection methods paired with various classifiers. en_US
dc.description.sponsorship We would like to thank The Scientific and Technological Research Council of Türkiye (TÜB˙ITAK) 2211A BIDEP program for supporting the work of N.S.E. The work of B.B.-G. has also been supported by the Abdullah Gul University Support Foundation (AGUV). B.B.-G. would like to express her gratitude for the L’Oréal-UNESCO Young Women Scientist Award. This research was made possible by the support of the L’Oréal-UNESCO Young Women Scientist Program. The work of M.Y. has been supported by Zefat Academic College. en_US
dc.identifier.endpage 37 en_US
dc.identifier.issn 2076-3417
dc.identifier.issue 6 en_US
dc.identifier.startpage 1 en_US
dc.identifier.uri https://doi.org/10.3390/app15062940
dc.identifier.uri https://hdl.handle.net/20.500.12573/2540
dc.identifier.volume 15 en_US
dc.language.iso eng en_US
dc.publisher MDPI en_US
dc.relation.isversionof 10.3390/app15062940 en_US
dc.relation.journal APPLIED SCIENCES-BASEL en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.relation.tubitak 2211A BIDEP
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Metagenomic analysis of colorectal cancer en_US
dc.subject Machine learning en_US
dc.subject Feature grouping en_US
dc.subject Functional proffiling of metagenomes en_US
dc.subject Community-level enzyme commission (EC) abundances en_US
dc.title Integrating Biological Domain Knowledge with Machine Learning for Identifying Colorectal-Cancer-Associated Microbial Enzymes in Metagenomic Data en_US
dc.type article en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
applsci-15-02940-v2.pdf
Size:
13.25 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: