A Runtime Heuristic to Selectively Replicate Tasks for Application-Specific Reliability Targets
No Thumbnail Available
Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Open Access Color
Green Open Access
Yes
OpenAIRE Downloads
72
OpenAIRE Views
43
Publicly Funded
No
Abstract
In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App_FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App_FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.
Description
Subasi, Omer/0000-0002-5373-7570;
ORCID
Keywords
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Parallel processing (Electronic computers), Selective replication, Processament en paral·lel (Ordinadors), Task parallelism, Dataflow programming, HPC and exascale computing, :Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC]
Turkish CoHE Thesis Center URL
Fields of Science
0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology
Citation
WoS Q
N/A
Scopus Q
Q3

OpenCitations Citation Count
6
Source
IEEE International Conference on Cluster Computing (CLUSTER) -- SEP 13-15, 2016 -- Taipei, TAIWAN
Volume
Issue
Start Page
498
End Page
505
PlumX Metrics
Citations
CrossRef : 3
Scopus : 7
Captures
Mendeley Readers : 6
SCOPUS™ Citations
7
checked on Feb 03, 2026
Web of Science™ Citations
6
checked on Feb 03, 2026
Page Views
5
checked on Feb 03, 2026
Google Scholar™


