Abstract. Advection of trace species, or tracers, also called tracer transport, in models of the atmosphere and other physical domains is an important and potentially computationally expensive part of a model's dynamical core. Semi-Lagrangian (SL) advection methods are efficient because they permit a time step much larger than the advective stability limit for explicit Eulerian methods without requiring the solution of a globally coupled system of equations as implicit Eulerian methods do. Thus, to reduce the computational expense of tracer transport, dynamical cores often use SL methods to advect tracers. The class of interpolation semi-Lagrangian (ISL) methods contains potentially extremely efficient SL methods. We describe a finite-element ISL transport method that we call the interpolation semi-Lagrangian element-based transport (Islet) method, such as for use with atmosphere models discretized using the spectral element method. The Islet method uses three grids that share an element grid: a dynamics grid supporting, for example, the Gauss–Legendre–Lobatto basis of degree three; a physics parameterizations grid with a configurable number of finite-volume subcells per element; and a tracer grid supporting use of Islet bases with particular basis again configurable. This method provides extremely accurate tracer transport and excellent diagnostic values in a number of verification problems.
We present a new evaluation framework for implicit and explicit (IMEX) Runge-Kutta time-stepping schemes. The new framework uses a linearized nonhydrostatic system of normal modes. We utilize the framework to investigate the stability of IMEX methods and their dispersion and dissipation of gravity, Rossby, and acoustic waves. We test the new framework on a variety of IMEX schemes and use it to develop and analyze a set of second-order low-storage IMEX Runge-Kutta methods with a high Courant-Friedrichs-Lewy (CFL) number. We show that the new framework is more selective than the 2-D acoustic system previously used in the literature. Schemes that are stable for the 2-D acoustic system are not stable for the system of normal modes.
We present an effort to port the nonhydrostatic atmosphere dynamical core of the Energy Exascale Earth System Model (E3SM) to efficiently run on a variety of architectures, including conventional CPU, many-core CPU, and GPU. We specifically target cloud-resolving resolutions of 3 km and 1 km. To express on-node parallelism we use the C++ library Kokkos, which allows us to achieve a performance portable code in a largely architecture-independent way. Our C++ implementation is at least as fast as the original Fortran implementation on IBM Power9 and Intel Knights Landing processors, proving that the code refactor did not compromise the efficiency on CPU architectures. On the other hand, when using the GPUs, our implementation is able to achieve 0.97 Simulated Years Per Day, running on the full Summit supercomputer. To the best of our knowledge, this is the most achieved to date by any global atmosphere dynamical core running at such resolutions.
We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single-and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.
Atmospheric tracer transport is a computationally demanding component of the atmospheric dynamical core of weather and climate simulations. Simulations typically have tens to hundreds of tracers. A tracer field is required to preserve several properties, including mass, shape, and tracer consistency. To improve computational efficiency, it is common to apply different spatial and temporal discretizations to the tracer transport equations than to the dynamical equations. Using different discretizations increases the difficulty of preserving properties. This paper provides a unified framework to analyze the property preservation problem and classes of algorithms to solve it. We examine the primary problem and a safety problem; describe three classes of algorithms to solve these; introduce new algorithms in two of these classes; make connections among the algorithms; analyze each algorithm in terms of correctness, bound on its solution magnitude, and its communication efficiency; and study numerical results. A new algorithm, QLT, has the smallest communication volume, and in an important case it redistributes mass approximately locally. These algorithms are only very loosely coupled to the underlying discretizations of the dynamical and tracer transport equations and thus are broadly and efficiently applicable. In addition, they may be applied to remap problems in applications other than tracer transport.
A set of algorithms based on characteristic discontinuous Galerkin methods is presented for tracer transport on the sphere. The algorithms are designed to reduce message passing interface communication volume per unit of simulated time relative to current methods generally, and to the spectral element scheme employed by the U.S. Department of Energy's Exascale Earth System Model (E3SM) specifically. Two methods are developed to enforce discrete mass conservation when the transport schemes are coupled to a separate dynamics solver; constrained transport and Jacobian-combined transport. A communication-efficient method is introduced to enforce tracer consistency between the transport scheme and dynamics solver; this method also provides the transport scheme's shape preservation capability. A subset of the algorithms derived here is implemented in E3SM and shown to improve transport performance by a factor of 2.2 for the model's standard configuration with 40 tracers at the strong scaling limit of one element per core.
Many applications, such as PDE based simulations and machine learning, apply BLAS/LAPACK routines to large groups of small matrices. While existing batched BLAS APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to BLAS/LAPACK routines that can be used within a hierarchical parallel application. Our layout provides up to 14x, 45x, and 27x speedup against OpenMP loops around optimized DGEMM, DTRSM and DGETRF kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched BLAS/LAPACK implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.
Physics-based models of volcanic eruptions track conduit processes as functions of depth and time. When used in inversions, these models permit integration of diverse geological and geophysical data sets to constrain important parameters of magmatic systems. We develop a 1-D steady state conduit model for effusive eruptions including equilibrium crystallization and gas transport through the conduit and compare with the quasi-steady dome growth phase of Mount St. Helens in 2005. Viscosity increase resulting from pressure-dependent crystallization leads to a natural transition from viscous flow to frictional sliding on the conduit margin. Erupted mass flux depends strongly on wall rock and magma permeabilities due to their impact on magma density. Including both lateral and vertical gas transport reveals competing effects that produce nonmonotonic behavior in the mass flux when increasing magma permeability. Using this physics-based model in a Bayesian inversion, we link data sets from Mount St. Helens such as extrusion flux and earthquake depths with petrological data to estimate unknown model parameters, including magma chamber pressure and water content, magma permeability constants, conduit radius, and friction along the conduit walls. Even with this relatively simple model and limited data, we obtain improved constraints on important model parameters. We find that the magma chamber had low (<5 wt %) total volatiles and that the magma permeability scale is well constrained at ∼10−11.4m2 to reproduce observed dome rock porosities. Compared with previous results, higher magma overpressure and lower wall friction are required to compensate for increased viscous resistance while keeping extrusion rate at the observed value.
The geodetically derived interseismic moment deficit rate (MDR) provides a first-order constraint on earthquake potential and can play an important role in seismic hazard assessment, but quantifying uncertainty in MDR is a challenging problem that has not been fully addressed. We establish criteria for reliable MDR estimators, evaluate existing methods for determining the probability density of MDR, and propose and evaluate new methods. Geodetic measurements moderately far from the fault provide tighter constraints on MDR than those nearby. Previously used methods can fail catastrophically under predictable circumstances. The bootstrap method works well with strong data constraints on MDR, but can be strongly biased when network geometry is poor. We propose two new methods: the Constrained Optimization Bounding Estimator (COBE) assumes uniform priors on slip rate (from geologic information) and MDR, and can be shown through synthetic tests to be a useful, albeit conservative estimator; the Constrained Optimization Bounding Linear Estimator (COBLE) is the corresponding linear estimator with Gaussian priors rather than point-wise bounds on slip rates. COBE matches COBLE with strong data constraints on MDR. We compare results from COBE and COBLE to previously published results for the interseismic MDR at Parkfield, on the San Andreas Fault, and find similar results; thus, the apparent discrepancy between MDR and the total moment release (seismic and afterslip) in the 2004 Parkfield earthquake remains.
High performance computing (HPC) is undergoing a dramatic change in computing architectures. Nextgeneration HPC systems are being based primarily on many-core processing units and general purpose graphics processing units (GPUs). A computing node on a next-generation system can be, and in practice is, heterogeneous in nature, involving multiple memory spaces and multiple execution spaces. This presents a challenge for the development of application codes that wish to compute at the extreme scales afforded by these next-generation HPC technologies and systems - the best parallel programming model for one system is not necessarily the best parallel programming model for another. This inevitably raises the following question: how does an application code achieve high performance on disparate computing architectures without having entirely different, or at least significantly different, code paths, one for each architecture? This question has given rise to the term ‘performance portability’, a notion concerned with porting application code performance from architecture to architecture using a single code base. In this paper, we present the work being done at Sandia National Labs to develop a performance portable compressible CFD code that is targeting the ‘leadership’ class supercomputers the National Nuclear Security Administration (NNSA) is acquiring over the course of the next decade.
Newton–Krylov solvers for ocean tracers have the potential to greatly decrease the computational costs of spinning up deep-ocean tracers, which can take several thousand model years to reach equilibrium with surface processes. One version of the algorithm uses offline tracer transport matrices to simulate an annual cycle of tracer concentrations and applies Newton's method to find concentrations that are periodic in time. Here we present the impact of time-averaging the transport matrices on the equilibrium values of an ideal-age tracer. We compared annually-averaged, monthly-averaged, and 5-day-averaged transport matrices to an online simulation using the ocean component of the Community Earth System Model (CESM) with a nominal horizontal resolution of 1° × 1° and 60 vertical levels. We found that increasing the time resolution of the offline transport model reduced a low age bias from 12% for the annually-averaged transport matrices, to 4% for the monthly-averaged transport matrices, and to less than 2% for the transport matrices constructed from 5-day averages. The largest differences were in areas with strong seasonal changes in the circulation, such as the Northern Indian Ocean. For many applications the relatively small bias obtained using the offline model makes the offline approach attractive because it uses significantly less computer resources and is simpler to set up and run.