Kokkos Technical Review Slides and Discussion Notes
Abstract not provided.
Abstract not provided.
This report describes a new capability for hierarchical task-data parallelism using Sandia's Kokkos and Qthreads, and evaluation of this capability with sparse matrix Cholesky factor- ization and social network triangle enumeration mini-applications. Hierarchical task-data parallelism consists of a collection of tasks with executes-after dependences where each task contains data parallel operations performed on a team of hardware threads. The collection of tasks and dependences form a directed acyclic graph of tasks - a task DAG . Major chal- lenges of this research and development effort include: portability and performance across multicore CPU; manycore Intel Xeon Phi, and NVIDIA GPU architectures; scalability with respect to hardware concurrency and size of the task DAG; and usability of the application programmer interface (API).
Abstract not provided.
CEUR Workshop Proceedings
As semantic graph database technology grows to address components ranging from large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a high performance hybrid system comprising computational capability for semantic graph database processing utilizing the multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths of the Billion Triple Challenge 2010 dataset.
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deploying that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.
The nearest neighbor search is a significant problem in transportation modeling and simulation. This paper describes how the nearest neighbor search is implemented efficiently with respect to running time in the NISAC Agent-Based Laboratory for Economics. The paper shows two methods to optimize running time of the nearest neighbor search. The first optimization uses a different distance metric that is more computationally efficient. The concept of a magnitude-comparable distance is described, and the paper gives a specific magnitude-comparable distance that is more computationally efficient than the actual distance function. The paper also shows how the given magnitude-comparable distance can be used to speed up the actual distance calculation. The second optimization reduces the number of points the search examines by using a spatial data structure. The paper concludes with testing of the different techniques discussed and the results.
Abstract not provided.
This report gives an overview of the types of economic methodologies and models used by Sandia economists in their consequence analysis work for the National Infrastructure Simulation & Analysis Center and other DHS programs. It describes the three primary resolutions at which analysis is conducted (microeconomic, mesoeconomic, and macroeconomic), the tools used at these three levels (from data analysis to internally developed and publicly available tools), and how they are used individually and in concert with each other and other infrastructure tools.
Large relational datasets such as national-scale social networks and power grids present different computational challenges than do physical simulations. Sandia's distributed-memory supercomputers are well suited for solving problems concerning the latter, but not the former. The reason is that problems such as pattern recognition and knowledge discovery on large networks are dominated by memory latency and not by computation. Furthermore, most memory requests in these applications are very small, and when the datasets are large, most requests miss the cache. The result is extremely low utilization. We are unlikely to be able to grow out of this problem with conventional architectures. As the power density of microprocessors has approached that of a nuclear reactor in the past two years, we have seen a leveling of Moores Law. Building larger and larger microprocessor-based supercomputers is not a solution for informatics and network infrastructure problems since the additional processors are utilized to only a tiny fraction of their capacity. An alternative solution is to use the paradigm of massive multithreading with a large shared memory. There is only one instance of this paradigm today: the Cray MTA-2. The proposal team has unique experience with and access to this machine. The XMT, which is now being delivered, is a Red Storm machine with up to 8192 multithreaded 'Threadstorm' processors and 128 TB of shared memory. For many years, the XMT will be the only way to address very large graph problems efficiently, and future generations of supercomputers will include multithreaded processors. Roughly 10 MTA processor can process a simple short paths problem in the time taken by the Gordon Bell Prize-nominated distributed memory code on 32,000 processors of Blue Gene/Light. We have developed algorithms and open-source software for the XMT, and have modified that software to run some of these algorithms on other multithreaded platforms such as the Sun Niagara and Opteron multi-core chips.
Abstract not provided.
SIAM Journal on Scientific Computing
Many problems in science and engineering require the solution of a long sequence of slowly changing linear systems. We propose and analyze two methods that significantly reduce the total number of matrix-vector products required to solve all systems. We consider the general case where both the matrix and right-hand side change, and we make no assumptions regarding the change in the right-hand sides. Furthermore, we consider general nonsingular matrices, and we do not assume that all matrices are pairwise close or that the sequence of matrices converges to a particular matrix. Our methods work well under these general assumptions, and hence form a significant advancement with respect to related work in this area. We can reduce the cost of solving subsequent systems in the sequence by recycling selected subspaces generated for previous systems. We consider two approaches that allow for the continuous improvement of the recycled subspace at low cost. We consider both Hermitian and non-Hermitian problems, and we analyze our algorithms both theoretically and numerically to illustrate the effects of subspace recycling. We also demonstrate the effectiveness of our algorithms for a range of applications from computational mechanics, materials science, and computational physics. © 2006 Society for Industrial and Applied Mathematics.