
Results 1126–1150 of 9,998
Skip to search filters

Cache Oblivious Strategies to Exploit Multi-Level Memory on Manycore Systems

Proceedings of MCHPC 2020: Workshop on Memory Centric High Performance Computing, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Butcher, Neil A.; Olivier, Stephen L.; Kogge, Peter M.

Many-core systems are beginning to feature novel large, high-bandwidth intermediate memory as a visible part of the memory hierarchy. This paper discusses how to make use of intermediate memory when composing matrix multiply with transpose to compute $A$ * AT. We re-purpose the cache-oblivious approach developed by Frigo et al. and apply it to the composition of a bandwidth-bound kernel (transpose) with a compute-bound kernel (matrix multiply). Particular focus is on regions of matrix shapes far from square that are not usually considered. Our codes are simpler than optimized codes, but reasonably close in performance. Also, perhaps of more importance is developing a paradigm for how to construct other codes using intermediate memories.

More Details

Chronicles of astra: Challenges and lessons from the first petascale arm supercomputer

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Pedretti, Kevin P.; Younge, Andrew J.; Hammond, Simon D.; Laros, James H.; Curry, Matthew J.; Aguilar, Michael J.; Hoekstra, Robert J.; Brightwell, Ronald B.

Arm processors have been explored in HPC for several years, however there has not yet been a demonstration of viability for supporting large-scale production workloads. In this paper, we offer a retrospective on the process of bringing up Astra, the first Petascale supercomputer based on 64-bit Arm processors, and validating its ability to run production HPC applications. Through this process several immature technology gaps were addressed, including software stack enablement, Linux bugs at scale, thermal management issues, power management capabilities, and advanced container support. From this experience, several lessons learned are formulated that contributed to the successful deployment of Astra. These insights can be helpful to accelerate deploying and maturing other first-seen HPC technologies. With Astra now supporting many users running a diverse set of production applications at multi-thousand node scales, we believe this constitutes strong supporting evidence that Arm is a viable technology for even the largest-scale supercomputer deployments.

More Details

Interface Flux Recovery coupling method for the ocean–atmosphere system

Results in Applied Mathematics

Sockwell, K.C.; Peterson, Kara J.; Kuberry, Paul A.; Bochev, Pavel B.; Trask, Nat

Component coupling is a crucial part of climate models, such as DOE's E3SM (Caldwell et al., 2019). A common coupling strategy in climate models is for their components to exchange flux data from the previous time-step. This approach effectively performs a single step of an iterative solution method for the monolithic coupled system, which may lead to instabilities and loss of accuracy. In this paper we formulate an Interface-Flux-Recovery (IFR) coupling method which improves upon the conventional coupling techniques in climate models. IFR starts from a monolithic formulation of the coupled discrete problem and then uses a Schur complement to obtain an accurate approximation of the flux across the interface between the model components. This decouples the individual components and allows one to solve them independently by using schemes that are optimized for each component. To demonstrate the feasibility of the method, we apply IFR to a simplified ocean–atmosphere model for heat-exchange coupled through the so-called bulk condition, common in ocean–atmosphere systems. We then solve this model on matching and non-matching grids to estimate numerically the convergence rates of the IFR coupling scheme.

More Details

Method of information entropy for convergence assessment of molecular dynamics simulations

Journal of Applied Physics

Talaat, Khaled; Cowen, Benjamin J.; Anderoglu, Osman

The lack of a reliable method to evaluate the convergence of molecular dynamics simulations has contributed to discrepancies in different areas of molecular dynamics. In the present work, the method of information entropy is introduced to molecular dynamics for stationarity assessment. The Shannon information entropy formalism is used to monitor the convergence of the atom motion to a steady state in a continuous spatial domain and is also used to assess the stationarity of calculated multidimensional fields such as the temperature field in a discrete spatial domain. It is demonstrated in this work that monitoring the information entropy of the atom position matrix provides a clear indicator of reaching steady state in radiation damage simulations, non-equilibrium molecular dynamics thermal conductivity computations, and simulations of Poiseuille and Couette flow in nanochannels. A main advantage of the present technique is that it is non-local and relies on fundamental quantities available in all molecular dynamics simulations. Unlike monitoring average temperature, the technique is applicable to simulations that conserve total energy such as reverse non-equilibrium molecular dynamics thermal conductivity computations and to simulations where energy dissipates through a boundary as in radiation damage simulations. The method is applied to simulations of iron using the Tersoff/ZBL splined potential, silicon using the Stillinger-Weber potential, and to Lennard-Jones fluid. Its applicability to both solids and fluids shows that the technique has potential for generalization to other areas in molecular dynamics.

More Details

On differentiable local bounds preserving stabilization for Euler equations

Computer Methods in Applied Mechanics and Engineering

Badia, Santiago; Bonilla, Jesús; Mabuza, Sibusiso; Shadid, John N.

This work presents the design of nonlinear stabilization techniques for the finite element discretization of Euler equations in both steady and transient form. Implicit time integration is used in the case of the transient form. A differentiable local bounds preserving method has been developed, which combines a Rusanov artificial diffusion operator and a differentiable shock detector. Nonlinear stabilization schemes are usually stiff and highly nonlinear. This issue is mitigated by the differentiability properties of the proposed method. Moreover, in order to further improve the nonlinear convergence, we also propose a continuation method for a subset of the stabilization parameters. The resulting method has been successfully applied to steady and transient problems with complex shock patterns. Numerical experiments show that it is able to provide sharp and well resolved shocks. The importance of the differentiability is assessed by comparing the new scheme with its non-differentiable counterpart. Numerical experiments suggest that, for up to moderate nonlinear tolerances, the method exhibits improved robustness and nonlinear convergence behavior for steady problems. In the case of transient problem, we also observe a reduction in the computational cost.

More Details
Results 1126–1150 of 9,998
Results 1126–1150 of 9,998