Publications

Results 126–150 of 186
Skip to search filters

Factors impacting performance of multithreaded sparse riangular solvet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Wolf, Michael M.; Heroux, Michael A.; Boman, Erik G.

As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes. © 2011 Springer-Verlag Berlin Heidelberg.

More Details

Expanding the Trilinos developer community

Heroux, Michael A.

The Trilinos Project started approximately nine years ago as a small effort to enable research, development and ongoing support of small, related solver software efforts. The 'Tri' in Trilinos was intended to indicate the eventual three packages we planned to develop. In 2007 the project expanded its scope to include any package that was an enabling technology for technical computing. Presently the Trilinos repository contains over 55 packages covering a broad spectrum of reusable tools for constructing full-featured scalable scientific and engineering applications. Trilinos usage is now worldwide, and many applications have an explicit dependence on Trilinos for essential capabilities. Users come from other US laboratories, universities, industry and international research groups. Awareness and use of Trilinos is growing rapidly outside of Sandia. Members of the external research community are becoming more familiar with Trilinos, its design and collaborative nature. As a result, the Trilinos project is receiving an increasing number of requests from external community members who want to contribute to Trilinos as developers. To-date we have worked with external developers in an ad hoc fashion. Going forward, we want to develop a set of policies, procedures, tools and infrastructure to simplify interactions with external developers. As we go forward with multi-laboratory efforts such as CASL and X-Stack, and international projects such as IESP, we will need a more streamlined and explicit process for making external developers 'first-class citizens' in the Trilinos development community. This document is intended to frame the discussion for expanding the Trilinos community to all strategically important external members, while at the same time preserving Sandia's primary leadership role in the project.

More Details

Trilinos for emerging parallel computing systems

Heroux, Michael A.

Trilinos is an object-oriented software framework to enabled the solution of large-scale, complex multiphysics engineering and scientific problems. Different Trilinos packages build on each other to create a stack providing the necessary capability: (1) Non-linear solver; (2) Linear solver/preconditioner; (3) Distributed linear algebra; and (4) Local linear algebra.

More Details

Factors impacting performance of multithreaded triangular solve

Wolf, Michael W.; Heroux, Michael A.; Boman, Erik G.

As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.

More Details

Parallel phase model: A programming model for high-end parallel machines with manycores

Proceedings of the International Conference on Parallel Processing

Brightwell, Ronald B.; Heroux, Michael A.; Wen, Zhaofang W.; Wu, Junfeng

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results. © 2009 IEEE.

More Details
Results 126–150 of 186
Results 126–150 of 186