Repository logoGCRIS
  • English
  • Türkçe
  • Русский
Log In
New user? Click here to register. Have you forgotten your password?
Home
Communities
Browse GCRIS
Entities
Overview
GCRIS Guide
  1. Home
  2. Browse by Author

Browsing by Author "Zyulkyarov, Ferad"

Filter results by typing the first few letters
Now showing 1 - 2 of 2
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    Conference Object
    Citation - WoS: 21
    Citation - Scopus: 25
    Designing and Modelling Selective Replication for Fault-Tolerant HPC Applications
    (IEEE, 2017) Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus
    Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However few studies address both types of errors together. In this paper we propose a software-based selective replication technique for HPC applications for both fail-stop errors and SDCs. Since complete replication of applications can be costly in terms of resources, we develop a runtime-based technique for selective replication. Selective replication provides an opportunity to meet HPC reliability targets while decreasing resource costs. Our technique is low-overhead, automatic and completely transparent to the user.
  • Loading...
    Thumbnail Image
    Conference Object
    Citation - WoS: 6
    Citation - Scopus: 7
    A Runtime Heuristic to Selectively Replicate Tasks for Application-Specific Reliability Targets
    (IEEE, 2016) Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus
    In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App_FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App_FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.
Repository logo
Collections
  • Scopus Collection
  • WoS Collection
  • TrDizin Collection
  • PubMed Collection
Entities
  • Research Outputs
  • Organizations
  • Researchers
  • Projects
  • Awards
  • Equipments
  • Events
About
  • Contact
  • GCRIS
  • Research Ecosystems
  • Feedback
  • OAI-PMH

Log in to GCRIS Dashboard

Powered by Research Ecosystems

  • Privacy policy
  • End User Agreement
  • Feedback