Publications

Publications / SAND Report

Effective software design and development for the new graph architecture HPC machines

Software applications need to change and adapt as modern architectures evolve. Nowadays advancement in chip design translates to increased parallelism. Exploiting such parallelism is a major challenge in modern software engineering. Multicore processors are about to introduce a significant change in the way we design and use fundamental data structures. In this work we describe the design and programming principles of a software library of highly concurrent scalable and nonblocking data containers. In this project we have created algorithms and data structures for handling fundamental computations in massively multithreaded contexts, and we have incorporated these into a usable library with familiar look and feel. In this work we demonstrate the first design and implementation of a wait-free hash table. Our multiprocessor data structure design allows a large number of threads to concurrently insert, remove, and retrieve information. Non-blocking designs alleviate the problems traditionally associated with the use of mutual exclusion, such as bottlenecks and thread-safety. Lock-freedom provides the ability to share data without some of the drawbacks associated with locks, however, these designs remain susceptible to starvation. Furthermore, wait-freedom provides all of the benefits of lock-free synchronization with the added assurance that every thread makes progress in a finite number of steps. This implies deadlock-freedom, livelock-freedom, starvation-freedom, freedom from priority inversion, and thread-safety. The challenges of providing the desirable progress and correctness guarantees of wait-free objects makes their design and implementation difficult. There are few wait-free data structures described in the literature. Using only standard atomic operations provided by the hardware, our design is portable; therefore, it is applicable to a variety of data-intensive applications including the domains of embedded systems and supercomputers.Our experimental evaluation shows that our hash table design outperforms the most advanced locking solution, provided by Intel's TBB library, by 22%. When compared to more traditional locking designs we show a performance improvement by a factor of 7.92. When compared to alternative non-blocking designs, our hash table demonstrates solid performance gains in a large majority of cases, typically by a factor of 3.44.