Integrating Biological Domain Knowledge With Machine Learning for Identifying Colorectal-Cancer Microbial Enzymes in Metagenomic Data

Loading...
Publication Logo

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

MDPI

Open Access Color

GOLD

Green Open Access

Yes

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

Advances in metagenomics have revolutionized our ability to elucidate links between the microbiome and human diseases. Colorectal cancer (CRC), a leading cause of cancer-related mortality worldwide, has been associated with dysbiosis of the gut microbiome. This study aims to develop a method for identifying CRC-associated microbial enzymes by incorporating biological domain knowledge into the feature selection process. Conventional feature selection techniques often evaluate features individually and fail to leverage biological knowledge during metagenomic data analysis. To address this gap, we propose the enzyme commission (EC)-nomenclature-based Grouping-Scoring-Modeling (G-S-M) method, which integrates biological domain knowledge into feature grouping and selection. The proposed method was tested on a CRC-associated metagenomic dataset collected from eight different countries. Community-level relative abundance values of enzymes were considered as features and grouped based on their EC categories to provide biologically informed groupings. Our findings in randomized 10-fold cross-validation experiments imply that glycosidases, CoA-transferases, hydro-lyases, oligo-1,6-glucosidase, crotonobetainyl-CoA hydratase, and citrate CoA-transferase enzymes can be associated with CRC development as part of different molecular pathways. These enzymes are mostly synthesized by Eschericia coli, Salmonella enterica, Klebsiella pneumoniae, Staphylococcus aureus, Streptococcus pneumoniae, and Clostridioides dificile. Comparative evaluation experiments showed that the proposed model consistently outperforms traditional feature selection methods paired with various classifiers.

Description

Keywords

Metagenomic Analysis of Colorectal Cancer, Machine Learning, Feature Grouping, Functional Profiling of Metagenomes, Community-Level Enzyme Commission (Ec) Abundances, Technology, QH301-705.5, T, Physics, QC1-999, Engineering (General). Civil engineering (General), metagenomic analysis of colorectal cancer, Chemistry, machine learning, functional profiling of metagenomes, feature grouping, TA1-2040, Biology (General), QD1-999, community-level enzyme commission (EC) abundances, Feature grouping, Community-level enzyme commission (EC) abundances, Metagenomic analysis of colorectal cancer, Machine learning, Functional proffiling of metagenomes

Fields of Science

Citation

WoS Q

Q2

Scopus Q

Q2
OpenCitations Logo
OpenCitations Citation Count
1

Source

Applied Sciences-Basel

Volume

15

Issue

6

Start Page

2940

End Page

PlumX Metrics
Citations

Scopus : 0

Captures

Mendeley Readers : 5

Page Views

2

checked on Mar 06, 2026

Downloads

3

checked on Mar 06, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.8432

Sustainable Development Goals

3

GOOD HEALTH AND WELL-BEING
GOOD HEALTH AND WELL-BEING Logo