Publications

Results 1–50 of 202
Skip to search filters

Kokkos 3: Programming Model Extensions for the Exascale Era

IEEE Transactions on Parallel and Distributed Systems

Trott, Christian R.; Lebrun-Grandie, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh Q.; Ellingwood, Nathan D.; Gayatri, Rahulkumar; Harvey, Evan C.; Hollman, Daisy S.; Ibanez, Dan; Liber, Nevin; Madsen, Jonathan; Miles, Jeff; Poliakoff, David Z.; Powell, Amy J.; Rajamanickam, Sivasankaran R.; Simberg, Mikael; Sunderland, Dan; Turcksin, Bruno; Wilke, Jeremiah

As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.

More Details

LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales

Computer Physics Communications

Thompson, Aidan P.; Aktulga, H.M.; Berger, Richard; Bolintineanu, Dan S.; Brown, W.M.; Crozier, Paul C.; in 't Veld, Pieter J.; Kohlmeyer, Axel; Moore, Stan G.; Nguyen, Trung D.; Shan, Ray; Stevens, Mark J.; Tranchida, Julien; Trott, Christian R.; Plimpton, Steven J.

Since the classical molecular dynamics simulator LAMMPS was released as an open source code in 2004, it has become a widely-used tool for particle-based modeling of materials at length scales ranging from atomic to mesoscale to continuum. Reasons for its popularity are that it provides a wide variety of particle interaction models for different materials, that it runs on any platform from a single CPU core to the largest supercomputers with accelerators, and that it gives users control over simulation details, either via the input script or by adding code for new interatomic potentials, constraints, diagnostics, or other features needed for their models. As a result, hundreds of people have contributed new capabilities to LAMMPS and it has grown from fifty thousand lines of code in 2004 to a million lines today. In this paper several of the fundamental algorithms used in LAMMPS are described along with the design strategies which have made it flexible for both users and developers. We also highlight some capabilities recently added to the code which were enabled by this flexibility, including dynamic load balancing, on-the-fly visualization, magnetic spin dynamics models, and quantum-accuracy machine learning interatomic potentials. Program Summary: Program Title: Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) CPC Library link to program files: https://doi.org/10.17632/cxbxs9btsv.1 Developer's repository link: https://github.com/lammps/lammps Licensing provisions: GPLv2 Programming language: C++, Python, C, Fortran Supplementary material: https://www.lammps.org Nature of problem: Many science applications in physics, chemistry, materials science, and related fields require parallel, scalable, and efficient generation of long, stable classical particle dynamics trajectories. Within this common problem definition, there lies a great diversity of use cases, distinguished by different particle interaction models, external constraints, as well as timescales and lengthscales ranging from atomic to mesoscale to macroscopic. Solution method: The LAMMPS code uses parallel spatial decomposition, distributed neighbor lists, and parallel FFTs for long-range Coulombic interactions [1]. The time integration algorithm is based on the Størmer-Verlet symplectic integrator [2], which provides better stability than higher-order non-symplectic methods. In addition, LAMMPS supports a wide range of interatomic potentials, constraints, diagnostics, software interfaces, and pre- and post-processing features. Additional comments including restrictions and unusual features: This paper serves as the definitive reference for the LAMMPS code. References: [1] S. Plimpton, Fast parallel algorithms for short-range molecular dynamics. J. Comp. Phys. 117 (1995) 1–19. [2] L. Verlet, Computer experiments on classical fluids: I. Thermodynamical properties of Lennard–Jones molecules, Phys. Rev. 159 (1967) 98–103.

More Details

Mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards

Proceedings of P3HPC 2019: International Workshop on Performance, Portability and Productivity in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Hollman, David S.; Lelbach, Bryce; Edwards, H.C.; Hoemmen, Mark F.; Sunderland, Daniel S.; Trott, Christian R.

Multi-dimensional arrays are ubiquitous in high-performance computing (HPC), but their absence from the C++ language standard is a long-standing and well-known limitation of their use for HPC. This paper describes the design and implementation of mdspan, a proposed C++ standard multidimensional array view (planned for inclusion in C++23). The proposal is largely inspired by work done in the Kokkos project - a C++ performance-portable programming model de- ployed by numerous HPC institutions to prepare their code base for exascale-class supercomputing systems. This paper describes the final design of mdspan af- ter a five-year process to achieve consensus in the C++ community. In particular, we will lay out how the design addresses some of the core challenges of performance-portable programming, and how its cus- tomization points allow a seamless extension into areas not currently addressed by the C++ Standard but which are of critical importance in the heterogeneous computing world of today's systems. Finally, we have provided a production-quality implementation of the proposal in its current form. This work includes several benchmarks of this implementation aimed at demon- strating the zero-overhead nature of the modern design.

More Details
Results 1–50 of 202
Results 1–50 of 202