PriPath: Identifying Dysregulated Pathways From Differential Gene Expression via Grouping, Scoring, and Modeling With an Embedded Feature Selection Approach

dc.contributor.author Yousef, Malik
dc.contributor.author Ozdemir, Fatma
dc.contributor.author Jaber, Amhar
dc.contributor.author Allmer, Jens
dc.contributor.author Bakir-Gungor, Burcu
dc.date.accessioned 2025-09-25T10:55:32Z
dc.date.available 2025-09-25T10:55:32Z
dc.date.issued 2023
dc.description Allmer, Jens/0000-0002-2164-7335; Yousef, Malik/0000-0001-8780-6303 en_US
dc.description.abstract BackgroundCell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases.ResultsPriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research.ConclusionsPriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine. en_US
dc.identifier.doi 10.1186/s12859-023-05187-2
dc.identifier.issn 1471-2105
dc.identifier.scopus 2-s2.0-85148796520
dc.identifier.uri https://doi.org/10.1186/s12859-023-05187-2
dc.identifier.uri https://hdl.handle.net/20.500.12573/4474
dc.language.iso en en_US
dc.publisher BMC en_US
dc.relation.ispartof Bmc Bioinformatics en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Feature Selection en_US
dc.subject Feature Scoring en_US
dc.subject Feature Grouping en_US
dc.subject Biological Knowledge Integration en_US
dc.subject Kegg Pathway en_US
dc.subject Classification en_US
dc.subject Gene Expression en_US
dc.subject Enrichment Analysis en_US
dc.subject Machine Learning en_US
dc.subject Bioinformatics en_US
dc.subject Data Science en_US
dc.subject Data Mining en_US
dc.subject Genomics en_US
dc.title PriPath: Identifying Dysregulated Pathways From Differential Gene Expression via Grouping, Scoring, and Modeling With an Embedded Feature Selection Approach en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Allmer, Jens/0000-0002-2164-7335
gdc.author.id Yousef, Malik/0000-0001-8780-6303
gdc.author.scopusid 14029389000
gdc.author.scopusid 57799965300
gdc.author.scopusid 58116683400
gdc.author.scopusid 24821311300
gdc.author.scopusid 25932029800
gdc.author.wosid Allmer, Jens/E-2335-2016
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Yousef, Malik] Zefat Acad Coll, Dept Informat Syst, IL-13206 Safed, Israel; [Yousef, Malik] Zefat Acad Coll, Galilee Digital Hlth Res Ctr GDH, Safed, Israel; [Ozdemir, Fatma; Jaber, Amhar; Bakir-Gungor, Burcu] Abdullah Gul Univ, Fac Engn, Dept Comp Engn, Kayseri, Turkiye; [Ozdemir, Fatma] Ruhr Univ, Univ Inst Digital Commun Syst, Bochum, Germany; [Allmer, Jens] Univ Appl Sci, Hsch Ruhr West, Inst Measurement Engn & Sensor Technol, Med Informat & Bioinformat, Mulheim, Germany en_US
gdc.description.issue 1 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.volume 24 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q1
gdc.identifier.openalex W4321600314
gdc.identifier.pmid 36823571
gdc.identifier.wos WOS:000937719700003
gdc.index.type WoS
gdc.index.type Scopus
gdc.index.type PubMed
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.downloads 102
gdc.oaire.impulse 17.0
gdc.oaire.influence 3.1040313E-9
gdc.oaire.isgreen true
gdc.oaire.keywords Bioinformatics
gdc.oaire.keywords QH301-705.5
gdc.oaire.keywords Computer applications to medicine. Medical informatics
gdc.oaire.keywords R858-859.7
gdc.oaire.keywords Gene Expression
gdc.oaire.keywords Data science
gdc.oaire.keywords Biological knowledge integration
gdc.oaire.keywords Neoplasms
gdc.oaire.keywords Machine learning
gdc.oaire.keywords Humans
gdc.oaire.keywords KEGG pathway
gdc.oaire.keywords Biology (General)
gdc.oaire.keywords Data mining
gdc.oaire.keywords Enrichment analysis
gdc.oaire.keywords Genome
gdc.oaire.keywords Gene Expression Profiling
gdc.oaire.keywords Feature grouping
gdc.oaire.keywords Computational Biology
gdc.oaire.keywords Genomics
gdc.oaire.keywords Classification
gdc.oaire.keywords Classification
gdc.oaire.keywords Feature selection
gdc.oaire.keywords Gene expression
gdc.oaire.keywords Feature scoring
gdc.oaire.keywords Algorithms
gdc.oaire.keywords Research Article
gdc.oaire.popularity 1.5150752E-8
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0301 basic medicine
gdc.oaire.sciencefields 03 medical and health sciences
gdc.oaire.sciencefields 0303 health sciences
gdc.oaire.views 162
gdc.openalex.collaboration International
gdc.openalex.fwci 4.45608978
gdc.openalex.normalizedpercentile 0.94
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 14
gdc.plumx.crossrefcites 6
gdc.plumx.mendeley 18
gdc.plumx.newscount 1
gdc.plumx.pubmedcites 11
gdc.plumx.scopuscites 15
gdc.scopus.citedcount 15
gdc.virtual.author Güngör, Burcu
gdc.wos.citedcount 15
relation.isAuthorOfPublication e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isAuthorOfPublication.latestForDiscovery e17be1f8-1c9a-45f2-bf0d-f8b348d2dba0
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 52f507ab-f278-4a1f-824c-44da2a86bd51
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files