As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.
Solving dense systems of linear equations is essential in applications encountered in physics, mathematics, and engineering. This paper describes our current efforts toward the development of the ADELUS package for current and next generation distributed, accelerator-based, high-performance computing platforms. The package solves dense linear systems using partial pivoting LU factorization on distributed-memory systems with CPUs/GPUs. The matrix is block-mapped onto distributed memory on CPUs/GPUs and is solved as if it was torus-wrapped for an optimal balance of computation and communication. A permutation operation is performed to restore the results so the torus-wrap distribution is transparent to the user. This package targets performance portability by leveraging the abstractions provided in the Kokkos and Kokkos Kernels libraries. Comparison of the performance gains versus the state-of-the-art SLATE and DPLASMA GESV functionalities on the Summit supercomputer are provided. Preliminary performance results from large-scale electromagnetic simulations using ADELUS are also presented. The solver achieves 7.7 Petaflops on 7600 GPUs of the Sierra supercomputer translating to 16.9% efficiency.
In this article, we examine the coupling into an electrically short azimuthal slot on a cylindrical cavity operating at fundamental cavity modal frequencies. We first develop a matched bound formulation through which we can gather information for maximum achievable levels of interior cavity fields. Actual field levels are below this matched bound; therefore, we also develop an unmatched formulation for frequencies below the slot resonance to achieve a better insight on the physics of this coupling. Good agreement is observed between the unmatched formulation, full-wave simulations, and experimental data, providing a validation of our analytical models. We then extend the unmatched formulation to treat an array of slots, found again in good agreement with full-wave simulations. These analytical models can be used to investigate ways to mitigate electromagnetic interference and electromagnetic compatibility effects within cavities.
This paper implemented an approximate direct inverse for the surface integral equation including multilevel fast-multipole method. We apply it as a preconditioner to two examples suffering convergence problem with an iterative solver.