Metagenomic Data Analysis With Machine Learning to Discover Colorectal Cancer-Associated Enzymes

dc.contributor.author Ersoz, Nur Sebnem
dc.contributor.author Kuzudisli, Cihan
dc.contributor.author Yousef, Malik
dc.contributor.author Bakir-Gungor, Burcu
dc.date.accessioned 2025-09-25T10:50:40Z
dc.date.available 2025-09-25T10:50:40Z
dc.date.issued 2024
dc.description.abstract The human gut microbiome comprises over 10 trillion microbes and plays important roles in maintaining metabolism, body homeostasis, impacting immune function. Metagenomics which studies genomic data from clinical and environmental samples is crucial in understanding the interplay between the host and the gut microbiome. Recently, functional profiling of metagenomes helps to identify alterations in microbial functions, particularly enzyme-encoding genes. Colorectal cancer (CRC) is known as one of the leading causes of cancer-related deaths. In this study, we aimed to find the CRC-associated enzymes by analyzing metagenomic data with different machine learning methods. A total of 1262 samples including CRC and control groups from different countries were used in this study. This dataset was obtained by functionally profiling metagenomics data and estimating community level enzyme commission (EC) abundance values. For the analysis of this dataset, RCE-IFE and SVM-RCE machine learning methods, which are group-based feature selection methods, were compared with 6 different individual feature selection methods. 10 times Monte-Carlo Cross Validation was used in our experiments. It was observed that RCE-IFE, Extreme Gradient Boosting and Select K Best methods similarly provided the best performances. Especially in this study, besides the its high performance, the group-based feature selection method RCE-IFE grouped enzymes into clusters unlike TFS, and then identified biologically relevant CRC-associated enzymes. en_US
dc.description.sponsorship Berdan Civata B.C.; et al.; Figes; Koluman; Loodos; Tarsus University
dc.identifier.doi 10.1109/SIU61531.2024.10601144
dc.identifier.isbn 9798350388978
dc.identifier.isbn 9798350388961
dc.identifier.issn 2165-0608
dc.identifier.scopus 2-s2.0-85200856780
dc.identifier.uri https://doi.org/10.1109/SIU61531.2024.10601144
dc.identifier.uri https://hdl.handle.net/20.500.12573/4187
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.ispartof 32nd IEEE Signal Processing and Communications Applications Conference (SIU) -- MAY 15-18, 2024 -- Tarsus Univ Campus, Mersin, TURKEY en_US
dc.relation.ispartofseries Signal Processing and Communications Applications Conference
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Colorectal Cancer Diagnosis en_US
dc.subject Metagenomics Data Analysis en_US
dc.subject Community-Level Enzyme Commission (EC) Abundance Values en_US
dc.subject Machine Learning en_US
dc.subject Grouping Based Feature Selection en_US
dc.title Metagenomic Data Analysis With Machine Learning to Discover Colorectal Cancer-Associated Enzymes en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.scopusid 57423006700
gdc.author.scopusid 57219838821
gdc.author.scopusid 14029389000
gdc.author.scopusid 25932029800
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Ersoz, Nur Sebnem] Abdullah Gul Univ, Fac Life & Nat Sci, Dept Bioengn, Kayseri, Turkiye; [Kuzudisli, Cihan] Basalt Kalyoncu Univ, Fac Engn, Dept Comp Engn, Gaziantep, Turkiye; [Yousef, Malik] Zefat Acad Coll, Galilee Digital Hlth Res Ctr, Dept Informat Syst, Safed, Israel; [Bakir-Gungor, Burcu] Abdullah Gul Univ, Fac Engn, Dept Comp Engn, Kayseri, Turkiye en_US
gdc.description.endpage 4
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 1
gdc.description.woscitationindex Conference Proceedings Citation Index - Science
gdc.description.wosquality N/A
gdc.identifier.openalex W4400908949
gdc.identifier.wos WOS:001297894700330
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.4895952E-9
gdc.oaire.isgreen false
gdc.oaire.keywords machine learning
gdc.oaire.keywords metagenomics data analysis
gdc.oaire.keywords grouping based feature selection
gdc.oaire.keywords colorectal cancer diagnosis
gdc.oaire.keywords community-level enzyme commission (EC) abundance values
gdc.oaire.popularity 2.3737945E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0301 basic medicine
gdc.oaire.sciencefields 0303 health sciences
gdc.oaire.sciencefields 03 medical and health sciences
gdc.openalex.collaboration International
gdc.openalex.fwci 0.0
gdc.openalex.normalizedpercentile 0.11
gdc.opencitations.count 0
gdc.plumx.mendeley 1
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Güngör, Burcu
gdc.wos.citedcount 0
relation.isAuthorOfPublication e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isAuthorOfPublication.latestForDiscovery e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 52f507ab-f278-4a1f-824c-44da2a86bd51
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files