PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach

dc.contributor.author Yousef, Malik
dc.contributor.author Ozdemir, Fatma
dc.contributor.author Jaber, Amhar
dc.contributor.author Allmer, Jens
dc.contributor.author Bakir-Gungor, Burcu
dc.contributor.authorID 0000-0002-5146-8207 en_US
dc.contributor.authorID 0000-0002-2272-6270 en_US
dc.contributor.department AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
dc.contributor.institutionauthor Ozdemir, Fatma
dc.contributor.institutionauthor Jaber, Amhar
dc.contributor.institutionauthor Bakir-Gungor, Burcu
dc.date.accessioned 2023-07-14T13:47:44Z
dc.date.available 2023-07-14T13:47:44Z
dc.date.issued 2023 en_US
dc.description.abstract BackgroundCell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases.ResultsPriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research.ConclusionsPriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine. en_US
dc.identifier.endpage 24 en_US
dc.identifier.issn 1471-2105
dc.identifier.issue 1 en_US
dc.identifier.other WOS:000937719700003
dc.identifier.startpage 1 en_US
dc.identifier.uri https://doi.org/10.1186/s12859-023-05187-2
dc.identifier.uri https://hdl.handle.net/20.500.12573/1628
dc.identifier.volume 24 en_US
dc.language.iso eng en_US
dc.publisher BMC en_US
dc.relation.isversionof 10.1186/s12859-023-05187-2 en_US
dc.relation.journal BMC BIOINFORMATICS en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Feature selection en_US
dc.subject Feature scoring en_US
dc.subject Feature grouping en_US
dc.subject Biological knowledge integration en_US
dc.subject KEGG pathway en_US
dc.subject Classification en_US
dc.subject Gene expression en_US
dc.subject Enrichment analysis en_US
dc.subject Machine learning en_US
dc.subject Bioinformatics en_US
dc.subject Data science en_US
dc.subject Data mining en_US
dc.subject Genomics en_US
dc.title PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach en_US
dc.type article en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s12859-023-05187-2.pdf
Size:
1.7 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: