CRC-based Memory Reliability for Task-parallel HPC Applications

dc.contributor.author Subasi, Omer
dc.contributor.author Unsal, Osman
dc.contributor.author Labarta, Jesus
dc.contributor.author Yalcin, Gulay
dc.contributor.author Cristal, Adrian
dc.contributor.department AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
dc.contributor.institutionauthor Yalcin, Gulay
dc.date.accessioned 2021-11-10T07:00:44Z
dc.date.available 2021-11-10T07:00:44Z
dc.date.issued 2016 en_US
dc.description.abstract Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exascale systems. For memory systems Error Correcting Codes (ECC) are the most commonly used mechanism. However state-of-the art hardware ECCs will not be sufficient in terms of error coverage for future computing systems and stronger hardware ECCs providing more coverage have prohibitive costs in terms of area, power and latency. Software-based solutions are needed to cooperate with hardware. In this work, we propose a Cyclic Redundancy Checks (CRCs) based software mechanism for task-parallel HPC applications. Our mechanism incurs only 1.7% performance overhead with hardware acceleration while being highly scalable at large scale. Our mathematical analysis demonstrates the effectiveness of our scheme and its error coverage. Results show that our CRCbased mechanism reduces the memory vulnerability by 87% on average with up to 32-bit burst (consecutive) and 5-bit arbitrary error correction capability. en_US
dc.description.sponsorship IEEE; IEEE Comp Soc, Tech Comm Parallel Proc; ACM SIGARCH; IEEE Comp Soc Tech Comm Comp Architecture; IEEE Comp Soc Tech Comm Distributed Proc en_US
dc.identifier.endpage 1112 en_US
dc.identifier.isbn 978-1-5090-2140-6
dc.identifier.issn 1530-2075
dc.identifier.startpage 1101 en_US
dc.identifier.uri https //doi.org/10.1109/IPDPS.2016.70
dc.identifier.uri https://hdl.handle.net/20.500.12573/1020
dc.language.iso eng en_US
dc.publisher IEEE345 E 47TH ST, NEW YORK, NY 10017 USA en_US
dc.relation.isversionof 10.1109/IPDPS.2016.70 en_US
dc.relation.journal 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016) en_US
dc.relation.publicationcategory Makale - Uluslararası - Editör Denetimli Dergi en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.title CRC-based Memory Reliability for Task-parallel HPC Applications en_US
dc.title.alternative Book SeriesInternational Parallel and Distributed Processing Symposium IPDPS en_US
dc.type bookPart en_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CRC-based Memory Reliability for Task-parallel HPC Applications.pdf
Size:
1.14 MB
Format:
Adobe Portable Document Format
Description:
Kitap Bölümü

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.44 KB
Format:
Item-specific license agreed upon to submission
Description: