Browsing by Author "Yalcin, Gulay"

Now showing 1 - 5 of 5

CompreCity: Accelerating the Traveling Salesman Problem on GPU with data compression
(ELSEVIER, 2025) Yalcin, Salih; Usul, Hamdi Burak; Yalcin, Gulay; 0000-0001-6476-4542; 0000-0003-3929-8126; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Yalcin, Salih; Yalcin, Gulay
Traveling Salesman Problem (TSP) is one of the significant problems in computer science which tries to find the shortest path for a salesman who needs to visit a set of cities and it is involved in many computing problems such as networks, genome analysis, logistics etc. Using parallel executing paradigms, especially GPUs, is appealing in order to reduce the problem solving time of TSP. One of the main issues in GPUs is to have limited GPU memory which would not be enough for the entire data. Therefore, transferring data from the host device would reduce the performance in execution time. In this study, we applied three data compression methodologies to represent cities in the TSP such as (1) Using Greatest Common Divisor (2) Shift Cities to the Origin (3) Splitting Surface to Grids. Therefore, we include more cities in GPU memory and reduce the number of data transfers from the host device. We implement our methodology in Iterated Local Search (ILS) algorithm with 2-opt and The Lin-Kernighan-Helsgaun (LKH) Algorithm. We show that our implementation presents more than 25% performance improvement for both algorithms.
CRC-based Memory Reliability for Task-parallel HPC Applications
(IEEE345 E 47TH ST, NEW YORK, NY 10017 USA, 2016) Subasi, Omer; Unsal, Osman; Labarta, Jesus; Yalcin, Gulay; Cristal, Adrian; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü; Yalcin, Gulay
Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exascale systems. For memory systems Error Correcting Codes (ECC) are the most commonly used mechanism. However state-of-the art hardware ECCs will not be sufficient in terms of error coverage for future computing systems and stronger hardware ECCs providing more coverage have prohibitive costs in terms of area, power and latency. Software-based solutions are needed to cooperate with hardware. In this work, we propose a Cyclic Redundancy Checks (CRCs) based software mechanism for task-parallel HPC applications. Our mechanism incurs only 1.7% performance overhead with hardware acceleration while being highly scalable at large scale. Our mathematical analysis demonstrates the effectiveness of our scheme and its error coverage. Results show that our CRCbased mechanism reduces the memory vulnerability by 87% on average with up to 32-bit burst (consecutive) and 5-bit arbitrary error correction capability.
Designing and Modelling Selective Replication for Fault-tolerant HPC Applications
(IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2017) Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü;
Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However few studies address both types of errors together. In this paper we propose a software-based selective replication technique for HPC applications for both fail-stop errors and SDCs. Since complete replication of applications can be costly in terms of resources, we develop a runtime-based technique for selective replication. Selective replication provides an opportunity to meet HPC reliability targets while decreasing resource costs. Our technique is low-overhead, automatic and completely transparent to the user.
A Methodology for Comparing the Reliability of GPU-Based and CPU-Based HPCs
(ASSOC COMPUTING MACHINERY, 2 PENN PLAZA, STE 701, NEW YORK, NY 10121-0701 USA, 2020) Cini, Nevin; Yalcin, Gulay; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
Today, GPUs are widely used as coprocessors/accelerators in High-Performance Heterogeneous Computing due to their many advantages. However, many researches emphasize that GPUs are not as reliable as desired yet. Despite the fact that GPUs are more vulnerable to hardware errors than CPUs, the use of GPUs in HPCs is increasing more and more. Moreover, due to native reliability problems of GPUs, combining a great number of GPUs with CPUs can significantly increase HPCs' failure rates. For this reason, analyzing the reliability characteristics of GPU-based HPCs has become a very important issue. Therefore, in this study we evaluate the reliability of GPU-based HPCs. For this purpose, we first examined field data analysis studies for GPU-based and CPU-based HPCs and identified factors that could increase systems failure/error rates. We then compared GPU-based HPCs with CPU-based HPCs in terms of reliability with the help of these factors in order to point out reliability challenges of GPU-based HPCs. Our primary goal is to present a study that can guide the researchers in this field by indicating the current state of GPU-based heterogeneous HPCs and requirements for the future, in terms of reliability. Our second goal is to offer a methodology to compare the reliability of GPU-based HPCs and CPU-based HPCs. To the best of our knowledge, this is the first survey study to compare the reliability of GPU-based and CPU-based HPCs in a systematic manner.
A runtime heuristic to selectively replicate tasks for application-specific reliability targets
(IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2016) Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus; AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü;
n this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App_FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App_FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.