A Methodology for Comparing the Reliability of GPU-Based and CPU-Based HPCS

dc.contributor.author Cini, Nevin
dc.contributor.author Yalcin, Gulay
dc.date.accessioned 2025-09-25T10:39:03Z
dc.date.available 2025-09-25T10:39:03Z
dc.date.issued 2020
dc.description Cini, Nevin/0000-0001-5348-4043 en_US
dc.description.abstract Today, GPUs are widely used as coprocessors/accelerators in High-Performance Heterogeneous Computing due to their many advantages. However, many researches emphasize that GPUs are not as reliable as desired yet. Despite the fact that GPUs are more vulnerable to hardware errors than CPUs, the use of GPUs in HPCs is increasing more and more. Moreover, due to native reliability problems of GPUs, combining a great number of GPUs with CPUs can significantly increase HPCs' failure rates. For this reason, analyzing the reliability characteristics of GPU-based HPCs has become a very important issue. Therefore, in this study we evaluate the reliability of GPU-based HPCs. For this purpose, we first examined field data analysis studies for GPU-based and CPU-based HPCs and identified factors that could increase systems failure/error rates. We then compared GPU-based HPCs with CPU-based HPCs in terms of reliability with the help of these factors in order to point out reliability challenges of GPU-based HPCs. Our primary goal is to present a study that can guide the researchers in this field by indicating the current state of GPU-based heterogeneous HPCs and requirements for the future, in terms of reliability. Our second goal is to offer a methodology to compare the reliability of GPU-based HPCs and CPU-based HPCs. To the best of our knowledge, this is the first survey study to compare the reliability of GPU-based and CPU-based HPCs in a systematic manner. en_US
dc.identifier.doi 10.1145/3372790
dc.identifier.issn 0360-0300
dc.identifier.issn 1557-7341
dc.identifier.scopus 2-s2.0-85079570950
dc.identifier.uri https://doi.org/10.1145/3372790
dc.identifier.uri https://hdl.handle.net/20.500.12573/3091
dc.language.iso en en_US
dc.publisher Assoc Computing Machinery en_US
dc.relation.ispartof Acm Computing Surveys en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject System Failure en_US
dc.subject Log File Analysis en_US
dc.subject Checkpoint/Recovery en_US
dc.title A Methodology for Comparing the Reliability of GPU-Based and CPU-Based HPCS en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Cini, Nevin/0000-0001-5348-4043
gdc.author.scopusid 57214945723
gdc.author.scopusid 23029394200
gdc.author.wosid Cn, Nn/Nxc-5067-2025
gdc.bip.impulseclass C5
gdc.bip.influenceclass C4
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Abdullah Gül University en_US
gdc.description.departmenttemp [Cini, Nevin; Yalcin, Gulay] Abdullah Gul Univ, TR-38080 Kayseri, Turkey en_US
gdc.description.endpage 33
gdc.description.issue 1 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.startpage 1
gdc.description.volume 53 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q1
gdc.identifier.openalex W3004697822
gdc.identifier.wos WOS:000582585800022
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 3.0
gdc.oaire.influence 3.6565426E-9
gdc.oaire.isgreen true
gdc.oaire.keywords Software organization and properties
gdc.oaire.keywords Computer systems organization
gdc.oaire.keywords log file analysis
gdc.oaire.keywords Cross-computing tools and techniques
gdc.oaire.keywords High Performance Computing
gdc.oaire.keywords checkpoint/recovery
gdc.oaire.keywords Hardware test
gdc.oaire.keywords Reliability
gdc.oaire.keywords Dependable and fault-tolerant systems and networks
gdc.oaire.keywords Extra-functional properties
gdc.oaire.keywords Hardware
gdc.oaire.keywords Yüksek başarımlı hesaplama
gdc.oaire.keywords System failure
gdc.oaire.keywords Software and its engineering
gdc.oaire.keywords failure prediction
gdc.oaire.keywords Robustness
gdc.oaire.keywords Graphics Processing Unit
gdc.oaire.popularity 8.546006E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 01 natural sciences
gdc.oaire.sciencefields 0103 physical sciences
gdc.openalex.collaboration National
gdc.openalex.fwci 0.29461312
gdc.openalex.normalizedpercentile 0.55
gdc.opencitations.count 4
gdc.plumx.crossrefcites 4
gdc.plumx.mendeley 21
gdc.plumx.scopuscites 10
gdc.scopus.citedcount 10
gdc.virtual.author Yalçın Alkan, Gülay
gdc.wos.citedcount 6
relation.isAuthorOfPublication e0dc9e40-f936-402f-96c6-f4e668a0b9d3
relation.isAuthorOfPublication.latestForDiscovery e0dc9e40-f936-402f-96c6-f4e668a0b9d3
relation.isOrgUnitOfPublication 665d3039-05f8-4a25-9a3c-b9550bffecef
relation.isOrgUnitOfPublication 52f507ab-f278-4a1f-824c-44da2a86bd51
relation.isOrgUnitOfPublication ef13a800-4c99-4124-81e0-3e25b33c0c2b
relation.isOrgUnitOfPublication.latestForDiscovery 665d3039-05f8-4a25-9a3c-b9550bffecef

Files