A Methodology for Comparing the Reliability of GPU-Based and CPU-Based HPCS
| dc.contributor.author | Cini, Nevin | |
| dc.contributor.author | Yalcin, Gulay | |
| dc.date.accessioned | 2025-09-25T10:39:03Z | |
| dc.date.available | 2025-09-25T10:39:03Z | |
| dc.date.issued | 2020 | |
| dc.description | Cini, Nevin/0000-0001-5348-4043 | en_US |
| dc.description.abstract | Today, GPUs are widely used as coprocessors/accelerators in High-Performance Heterogeneous Computing due to their many advantages. However, many researches emphasize that GPUs are not as reliable as desired yet. Despite the fact that GPUs are more vulnerable to hardware errors than CPUs, the use of GPUs in HPCs is increasing more and more. Moreover, due to native reliability problems of GPUs, combining a great number of GPUs with CPUs can significantly increase HPCs' failure rates. For this reason, analyzing the reliability characteristics of GPU-based HPCs has become a very important issue. Therefore, in this study we evaluate the reliability of GPU-based HPCs. For this purpose, we first examined field data analysis studies for GPU-based and CPU-based HPCs and identified factors that could increase systems failure/error rates. We then compared GPU-based HPCs with CPU-based HPCs in terms of reliability with the help of these factors in order to point out reliability challenges of GPU-based HPCs. Our primary goal is to present a study that can guide the researchers in this field by indicating the current state of GPU-based heterogeneous HPCs and requirements for the future, in terms of reliability. Our second goal is to offer a methodology to compare the reliability of GPU-based HPCs and CPU-based HPCs. To the best of our knowledge, this is the first survey study to compare the reliability of GPU-based and CPU-based HPCs in a systematic manner. | en_US |
| dc.identifier.doi | 10.1145/3372790 | |
| dc.identifier.issn | 0360-0300 | |
| dc.identifier.issn | 1557-7341 | |
| dc.identifier.scopus | 2-s2.0-85079570950 | |
| dc.identifier.uri | https://doi.org/10.1145/3372790 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12573/3091 | |
| dc.language.iso | en | en_US |
| dc.publisher | Assoc Computing Machinery | en_US |
| dc.relation.ispartof | Acm Computing Surveys | en_US |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | System Failure | en_US |
| dc.subject | Log File Analysis | en_US |
| dc.subject | Checkpoint/Recovery | en_US |
| dc.title | A Methodology for Comparing the Reliability of GPU-Based and CPU-Based HPCS | en_US |
| dc.type | Article | en_US |
| dspace.entity.type | Publication | |
| gdc.author.id | Cini, Nevin/0000-0001-5348-4043 | |
| gdc.author.scopusid | 57214945723 | |
| gdc.author.scopusid | 23029394200 | |
| gdc.author.wosid | Cn, Nn/Nxc-5067-2025 | |
| gdc.bip.impulseclass | C5 | |
| gdc.bip.influenceclass | C4 | |
| gdc.bip.popularityclass | C4 | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::journal::journal article | |
| gdc.collaboration.industrial | false | |
| gdc.description.department | Abdullah Gül University | en_US |
| gdc.description.departmenttemp | [Cini, Nevin; Yalcin, Gulay] Abdullah Gul Univ, TR-38080 Kayseri, Turkey | en_US |
| gdc.description.endpage | 33 | |
| gdc.description.issue | 1 | en_US |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
| gdc.description.scopusquality | Q1 | |
| gdc.description.startpage | 1 | |
| gdc.description.volume | 53 | en_US |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.description.wosquality | Q1 | |
| gdc.identifier.openalex | W3004697822 | |
| gdc.identifier.wos | WOS:000582585800022 | |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.oaire.diamondjournal | false | |
| gdc.oaire.impulse | 3.0 | |
| gdc.oaire.influence | 3.6565426E-9 | |
| gdc.oaire.isgreen | true | |
| gdc.oaire.keywords | Software organization and properties | |
| gdc.oaire.keywords | Computer systems organization | |
| gdc.oaire.keywords | log file analysis | |
| gdc.oaire.keywords | Cross-computing tools and techniques | |
| gdc.oaire.keywords | High Performance Computing | |
| gdc.oaire.keywords | checkpoint/recovery | |
| gdc.oaire.keywords | Hardware test | |
| gdc.oaire.keywords | Reliability | |
| gdc.oaire.keywords | Dependable and fault-tolerant systems and networks | |
| gdc.oaire.keywords | Extra-functional properties | |
| gdc.oaire.keywords | Hardware | |
| gdc.oaire.keywords | Yüksek başarımlı hesaplama | |
| gdc.oaire.keywords | System failure | |
| gdc.oaire.keywords | Software and its engineering | |
| gdc.oaire.keywords | failure prediction | |
| gdc.oaire.keywords | Robustness | |
| gdc.oaire.keywords | Graphics Processing Unit | |
| gdc.oaire.popularity | 8.546006E-9 | |
| gdc.oaire.publicfunded | false | |
| gdc.oaire.sciencefields | 01 natural sciences | |
| gdc.oaire.sciencefields | 0103 physical sciences | |
| gdc.openalex.collaboration | National | |
| gdc.openalex.fwci | 0.29461312 | |
| gdc.openalex.normalizedpercentile | 0.55 | |
| gdc.opencitations.count | 4 | |
| gdc.plumx.crossrefcites | 4 | |
| gdc.plumx.mendeley | 21 | |
| gdc.plumx.scopuscites | 10 | |
| gdc.scopus.citedcount | 10 | |
| gdc.virtual.author | Yalçın Alkan, Gülay | |
| gdc.wos.citedcount | 6 | |
| relation.isAuthorOfPublication | e0dc9e40-f936-402f-96c6-f4e668a0b9d3 | |
| relation.isAuthorOfPublication.latestForDiscovery | e0dc9e40-f936-402f-96c6-f4e668a0b9d3 | |
| relation.isOrgUnitOfPublication | 665d3039-05f8-4a25-9a3c-b9550bffecef | |
| relation.isOrgUnitOfPublication | 52f507ab-f278-4a1f-824c-44da2a86bd51 | |
| relation.isOrgUnitOfPublication | ef13a800-4c99-4124-81e0-3e25b33c0c2b | |
| relation.isOrgUnitOfPublication.latestForDiscovery | 665d3039-05f8-4a25-9a3c-b9550bffecef |
