Publications

127 Results
Skip to search filters

Comprehensive uncertainty quantification (UQ) for full engineering models by solving probability density function (PDF) equation

Kolla, Hemanth K.; De, Saibal D.; Jones, Reese E.; Hansen, Michael A.; Plews, Julia A.

This report details a new method for propagating parameter uncertainty (forward uncertainty quantification) in partial differential equations (PDE) based computational mechanics applications. The method provides full-field quantities of interest by solving for the joint probability density function (PDF) equations which are implied by the PDEs with uncertain parameters. Full-field uncertainty quantification enables the design of complex systems where quantities of interest, such as failure points, are not known apriori. The method, motivated by the well-known probability density function (PDF) propagation method of turbulence modeling, uses an ensemble of solutions to provide the joint PDF of desired quantities at every point in the domain. A small subset of the ensemble is computed exactly, and the remainder of the samples are computed with approximation of the driving (dynamics) term of the PDEs based on those exact solutions. Although the proposed method has commonalities with traditional interpolatory stochastic collocation methods applied directly to quantities of interest, it is distinct and exploits the parameter dependence and smoothness of the dynamics term of the governing PDEs. The efficacy of the method is demonstrated by applying it to two target problems: solid mechanics explicit dynamics with uncertain material model parameters, and reacting hypersonic fluid mechanics with uncertain chemical kinetic rate parameters. A minimally invasive implementation of the method for representative codes SPARC (reacting hypersonics) and NimbleSM (finite- element solid mechanics) and associated software details are described. For solid mechanics demonstration problems the method shows order of magnitudes improvement in accuracy over traditional stochastic collocation. For the reacting hypersonics problem, the method is implemented as a streamline integration and results show very good accuracy for the approximate sample solutions of re-entry flow past the Apollo capsule geometry at Mach 30.

More Details

A minimally invasive, efficient method for propagation of full-field uncertainty in solid dynamics

International Journal for Numerical Methods in Engineering

Jones, Reese E.; Redle, Michael T.; Kolla, Hemanth K.; Plews, Julia A.

We present a minimally invasive method for forward propagation of material property uncertainty to full-field quantities of interest in solid dynamics. Full-field uncertainty quantification enables the design of complex systems where quantities of interest, such as failure points, are not known a priori. The method, motivated by the well-known probability density function (PDF) propagation method of turbulence modeling, uses an ensemble of solutions to provide the joint PDF of desired quantities at every point in the domain. A small subset of the ensemble is computed exactly, and the remainder of the samples are computed with approximation of the evolution equations based on those exact solutions. Although the proposed method has commonalities with traditional interpolatory stochastic collocation methods applied directly to quantities of interest, it is distinct and exploits the parameter dependence and smoothness of the driving term of the evolution equations. The implementation is model independent, storage and communication efficient, and straightforward. We demonstrate its efficiency, accuracy, scaling with dimension of the parameter space, and convergence in distribution with two problems: a quasi-one-dimensional bar impact, and a two material notched plate impact. For the bar impact problem, we provide an analytical solution to PDF of the solution fields for method validation. With the notched plate problem, we also demonstrate good parallel efficiency and scaling of the method.

More Details

Improving Scalability of Silent-Error Resilience for Message-Passing Solvers via Local Recovery and Asynchrony

Proceedings of FTXS 2020: Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Kolla, Hemanth K.; Mayo, Jackson M.; Teranishi, Keita T.; Armstrong, Robert C.

Benefits of local recovery (restarting only a failed process or task) have been previously demonstrated in parallel solvers. Local recovery has a reduced impact on application performance due to masking of failure delays (for message-passing codes) or dynamic load balancing (for asynchronous many-task codes). In this paper, we implement MPI-process-local checkpointing and recovery of data (as an extension of the Fenix library) in combination with an existing method for local detection of silent errors in partial-differential-equation solvers, to show a path for incorporating lightweight silent-error resilience. In addition, we demonstrate how asynchrony introduced by maximizing computation-communication overlap can halt the propagation of delays. For a prototype stencil solver (including an iterative-solver-like variant) with injected memory bit flips, results show greatly reduced overhead under weak scaling compared to global recovery, and high failure-masking efficiency. The approach is expected to be generalizable to other MPI-based solvers.

More Details

Enabling Resilience in Asynchronous Many-Task Programming Models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Paul, Sri R.; Hayashi, Akihiro; Slattengren, Nicole S.; Kolla, Hemanth K.; Whitlock, Matthew J.; Bak, Seonmyeong; Teranishi, Keita T.; Mayo, Jackson M.; Sarkar, Vivek

Resilience is an imminent issue for next-generation platforms due to projected increases in soft/transient failures as part of the inherent trade-offs among performance, energy, and costs in system design. In this paper, we introduce a comprehensive approach to enabling application-level resilience in Asynchronous Many-Task (AMT) programming models with a focus on remedying Silent Data Corruption (SDC) that can often go undetected by the hardware and OS. Our approach makes it possible for the application programmer to declaratively express resilience attributes with minimal code changes, and to delegate the complexity of efficiently supporting resilience to our runtime system. We have created a prototype implementation of our approach as an extension to the Habanero C/C++ library (HClib), where different resilience techniques including task replay, task replication, algorithm-based fault tolerance (ABFT), and checkpointing are available. Our experimental results show that task replay incurs lower overhead than task replication when an appropriate error checking function is provided. Further, task replay matches the low overhead of ABFT. Our results also demonstrate the ability to combine different resilience schemes. To evaluate the effectiveness of our resilience mechanisms in the presence of errors, we injected synthetic errors at different error rates (1.0%, and 10.0%) and found modest increase in execution times. In summary, the results show that our approach supports efficient and scalable recovery, and that our approach can be used to influence the design of future AMT programming models and runtime systems that aim to integrate first-class support for user-level resilience.

More Details

Scalable collectives for distributed asynchronous many-task runtimes

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Whitlock, Matthew J.; Kolla, Hemanth K.; Treichler, Sean; Pebay, Philippe; Bennett, Janine C.

Global collectives (reductions/aggregations) are ubiquitous and feature in nearly every application of distributed high-performance computing (HPC). While it is advisable to devise algorithms by placing collectives off the critical path of execution, they are sometimes unavoidable for correctness, numerical convergence and analyses purposes. Scalable algorithms for distributed collectives are well studied and have become an integral part of MPI, but new and emerging distributed computing frameworks and paradigms such as Asynchronous Many-Task (AMT) models lack the same sophistication for distributed collectives. Since the central promise of AMT runtimes is that they automatically discover, and expose, task dependencies in the underlying program and can schedule work optimally to minimize idle time and hide data movement, a naively designed collectives protocol can completely offset any gains made from asynchronous execution. In this study we demonstrate that scalable distributed collectives are indispensable for performance in AMT models. We design, implement and test the performance of a scalable collective algorithm in Legion, an exemplar data-centric AMT programming model. Our results show that AMT systems contain the necessary primitives that allow for fully scalable collectives without breaking the transparent data movement abstractions. Scalability tests of an integrated Legion 1D stencil mini-application show the clear benefit of implementing scalable collectives and the performance degradation when a naïve collectives alternative is used instead.

More Details

ASC CSSE Level 2 Milestone #6362: Resilient Asynchronous Many Task Programming Model

Teranishi, Keita T.; Kolla, Hemanth K.; Slattengren, Nicole S.; Whitlock, Matthew J.; Mayo, Jackson M.; Clay, Robert L.; Paul, Sri R.; Hayashi, Akihiro H.; Sarkar, Vivek S.

This report is an outcome of the ASC CSSE Level 2 Milestone 6362: Analysis of Re- silient Asynchronous Many-Task (AMT) Programming Model. It comprises a summary and in-depth analysis of resilience schemes adapted to the AMT programming model. Herein, performance trade-offs of a resilient-AMT prograrnming model are assessed through two ap- proaches: (1) an analytical model realized by discrete event simulations and (2) empirical evaluation of benchmark programs representing regular and irregular workloads of explicit partial differential equation solvers. As part of this effort, an AMT execution simulator and a prototype resilient-AMT programming framework have been developed. The former permits us to hypothesize the performance behavior of a resilient-AMT model, and has undergone a verification and validation (V&V) process. The latter allows empirical evaluation of the perfor- mance of resilience schemes under emulated program failures and enabled the aforementioned V&V process. The outcome indicates that (1) resilience techniques implemented within an AMT framework allow efficient and scalable recovery under frequent failures, that (2) the abstraction of task and data instances in the AMT programming model enables readily us- able Application Program Interfaces (APIs) for resilience, and that (3) this abstraction enables predicting the performance of resilient-AMT applications with a simple simulation infrastruc- ture. This outcome will provide guidance for the design of the AMT programming model and runtime systems, user-level resilience support, and application development for ASC's next generation platforms (NGPs).

More Details

A novel shard-based approach for asynchronous many-task models for in situ analysis*

Proceedings of ISAV 2017: In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis

Pébaÿ, P.P.; Borghesi, G.; Kolla, Hemanth K.; Bennett, Janine C.; Treichler, S.

We present the current status of our work towards a scalable, asynchronous many-task, in situ statistical analysis engine using the Legion runtime system, expanding upon earlier work, that was limited to a prototype implementation with a proxy mini-application as a surrogate for a full-scale scientific simulation code. In contrast, we have more recently integrated our in situ analysis engines with S3D, a full-size scientific application, and conducted numerical tests therewith on the largest computational platform currently available for DOE science applications. The goal of this article is thus to describe the SPMD-Legion methodology we devised in this context, and compare the data aggregation technique deployed herein to the approach taken within our previous work.

More Details

Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

SIAM Journal on Scientific Computing

Gamell, Marc G.; Teranishi, Keita T.; Kolla, Hemanth K.; Mayo, Jackson M.; Heroux, Michael A.; Chen, Jacqueline H.; Parashar, Manish P.

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments. In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.

More Details

Modeling and simulating multiple failure masking enabled by local recovery for stencil-based applications at extreme scales

IEEE Transactions on Parallel and Distributed Systems

Gamell, Marc; Teranishi, Keita T.; Mayo, Jackson M.; Kolla, Hemanth K.; Heroux, Michael A.; Chen, Jacqueline H.; Parashar, Manish

Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this paper we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. Our results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.

More Details

Scalability of Several Asynchronous Many-Task Models for In Situ Statistical Analysis

Pebay, Philippe P.; Bennett, Janine C.; Kolla, Hemanth K.; Borghesi, G.

This report is a sequel to [PB16], in which we provided a first progress report on research and development towards a scalable, asynchronous many-task, in situ statistical analysis engine using the Legion runtime system. This earlier work included a prototype implementation of a proposed solution, using a proxy mini-application as a surrogate for a full-scale scientific simulation code. The first scalability studies were conducted with the above on modestly-sized experimental clusters. In contrast, in the current work we have integrated our in situ analysis engines with a full-size scientific application (S3D, using the Legion-SPMD model), and have conducted nu- merical tests on the largest computational platform currently available for DOE science ap- plications. We also provide details regarding the design and development of a light-weight asynchronous collectives library. We describe how this library is utilized within our SPMD- Legion S3D workflow, and compare the data aggregation technique deployed herein to the approach taken within our previous work.

More Details

Metaprogramming-Enabled Parallel Execution of Apparently Sequential C++ Code

Proceedings of ESPM2 2016: 2nd International Workshop on Extreme Scale Programming Models and Middleware - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis

Hollman, David S.; Bennett, Janine C.; Kolla, Hemanth K.; Lifflander, Jonathan; Slattengren, Nicole S.; Wilke, Jeremiah J.

Task-based execution models have received considerable attention in recent years to meet the performance challenges facing high-performance computing (HPC). In this paper we introduce MetaPASS-Metaprogramming-enabled Para-llelism from Apparently Sequential Semantics-a proof-of-concept, non-intrusive header library that enables implicit task-based parallelism in a sequential C++ code. MetaPASS is a data-driven model, relying on dependency analysis of variable read-/write accesses to derive a directed acyclic graph (DAG) of the computation to be performed. MetaPASS enables embedding of runtime dependency analysis directly in C++ applications using only template metaprogramming. Rather than requiring verbose task-based code or source-to-source compilers, a native C++ code can be made task-based with minimal modifications. We present an overview of the programming model enabled by MetaPASS and the C++ runtime API required to support it. Details are provided regarding how standard template metaprogramming is used to capture task dependencies. We finally discuss how the programming model can be deployed in both an MPI+X and in a standalone distributed memory context.

More Details

Flame thickness and conditional scalar dissipation rate in a premixed temporal turbulent reacting jet

Combustion and Flame

Chaudhuri, Swetaprovo; Kolla, Hemanth K.; Dave, Himanshu L.; Hawkes, Evatt R.; Chen, Jacqueline H.; Law, Chung K.

The flame structure corresponding to lean hydrogen–air premixed flames in intense sheared turbulence in the thin reaction zone regime is quantified from flame thickness and conditional scalar dissipation rate statistics, obtained from recent direct numerical simulation data of premixed temporally-evolving turbulent slot jet flames [1]. It is found that, on average, these sheared turbulent flames are thinner than their corresponding planar laminar flames. Extensive analysis is performed to identify the reason for this counter-intuitive thinning effect. The factors controlling the flame thickness are analyzed through two different routes i.e., the kinematic route, and the transport and chemical kinetics route. The kinematic route is examined by comparing the statistics of the normal strain rate due to fluid motion with the statistics of the normal strain rate due to varying flame displacement speed or self-propagation. It is found that while the fluid normal straining is positive and tends to separate iso-scalar surfaces, the dominating normal strain rate due to self-propagation is negative and tends to bring the iso-scalar surfaces closer resulting in overall thinning of the flame. The transport and chemical kinetics route is examined by studying the non-unity Lewis number effect on the premixed flames. The effects from the kinematic route are found to couple with the transport and chemical kinetics route. In addition, the intermittency of the conditional scalar dissipation rate is also examined. It is found to exhibit a unique non-monotonicity of the exponent of the stretched exponential function, conventionally used to describe probability density function tails of such variables. The non-monotonicity is attributed to the detailed chemical structure of hydrogen-air flames in which heat release occurs close to the unburnt reactants at near free-stream temperatures.

More Details

Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights

Computational Statistics

Pébay, Philippe; Terriberry, Timothy B.; Kolla, Hemanth K.; Bennett, Janine C.

Formulas for incremental or parallel computation of second order central moments have long been known, and recent extensions of these formulas to univariate and multivariate moments of arbitrary order have been developed. Such formulas are of key importance in scenarios where incremental results are required and in parallel and distributed systems where communication costs are high. We survey these recent results, and improve them with arbitrary-order, numerically stable one-pass formulas which we further extend with weighted and compound variants. We also develop a generalized correction factor for standard two-pass algorithms that enables the maintenance of accuracy over nearly the full representable range of the input, avoiding the need for extended-precision arithmetic. We then empirically examine algorithm correctness for pairwise update formulas up to order four as well as condition number and relative error bounds for eight different central moment formulas, each up to degree six, to address the trade-offs between numerical accuracy and speed of the various algorithms. Finally, we demonstrate the use of the most elaborate among the above mentioned formulas, with the utilization of the compound moments for a practical large-scale scientific application.

More Details

Velocity and Reactive Scalar Dissipation Spectra in Turbulent Premixed Flames

Combustion Science and Technology

Kolla, Hemanth K.; Zhao, Xin Y.; Chen, Jacqueline H.; Swaminathan, N.

Dissipation spectra of velocity and reactive scalars—temperature and fuel mass fraction—in turbulent premixed flames are studied using direct numerical simulation data of a temporally evolving lean hydrogen-air premixed planar jet (PTJ) flame and a statistically stationary planar lean methane-air (SP) flame. The equivalence ratio in both cases was 0.7, the pressure 1 atm while the unburned temperature was 700 K for the hydrogen-air PTJ case and 300 K for methane-air SP case, resulting in data sets with a density ratio of 3 and 5, respectively. The turbulent Reynolds numbers for the cases ranged from 200 to 428.4, the Damköhler number from 3.1 to 29.1, and the Karlovitz number from 0.1 to 4.5. The dissipation spectra collapse when normalized by the respective Favre-averaged dissipation rates. However, the normalized dissipation spectra in all the cases deviate noticeably from those predicted by classical scaling laws for constant-density turbulent flows and bear a clear influence of the chemical reactions on the dissipative range of the energy cascade.

More Details

Towards asynchronous many-task in situ data analysis using legion

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Pébaÿ, Philippe; Bennett, Janine C.; Hollman, David S.; Treichler, Sean; McCormick, Patrick S.; Sweeney, Christine M.; Kolla, Hemanth K.; Aiken, Alex

We explore the use of asynchronous many-task (AMT) programming models for the implementation of in situ analysis towards the goal of maximizing programmer productivity and overall performance on next generation platforms. We describe how a broad class of statistics algorithms can be transformed from a traditional single-programm multiple-data (SPMD) implementation to an AMT implementation, demonstrating with a concrete example: a measurement of descriptive statistics implemented in Legion. Our experiments to quantify the benefit and possible drawbacks of this approach are in progress, and we present some encouraging initial results on the (minimal) impact of the AMT-based approach on code complexity, task scheduling, and application scalability.

More Details

DARMA 0.3.0-alpha Specification

Wilke, Jeremiah J.; Hollman, David S.; Slattengren, Nicole S.; lifflander, jonathan l.; Kolla, Hemanth K.; Rizzi, Francesco N.; Teranishi, Keita T.; Bennett, Janine C.

PARMA (Distributed Asynchronous Resilient Models and ApH asynchronous many-task (AMT) rmogramming models and hardware idiosyncrasies, 2) improve application programmer interface (API) plication Ico-desiga activities into meaningful requirements for characterization and definition, accelerating the development of pARMAI APT is a rranslation layer runtime systems Am' 11 between an application-facing . The application-facing user-level iting the generic language constructs of C++ and adding parallel programs. Though the implementation of the provide the front end semantics, it is nonetheless fully embedded in the C++ language and leverages a widely supported front end fiack end in C++, inher- that facilitate expressing distributed asynchronous uses C++ constructs unfamiliar to many programmers to subset of C++14 functionality (gcc >= 4.9, clang >= 3.5, icc > = 16). The rranslation layer leverages C++ to map the user's code onto the fiack encI runtime APT. The fiack end APT is a set of abstract classes and function signatures that iuntime systenr developers must implement in accordance with the specification require- ments in order to interface with application code written to the must link to a iuntime systenr that implements the abstract mentations will be external, drawing upon existing provided in the pARMAI code distribution. IDARMAI fiack end templatO front end. Executable 1DARMA applications runtime APT. It is intended that these imple- technologies. However, a reference implementation will be The front end rranslation layer, and iback end APT are detailed herein. We also include a list of application requirements driving the specification (along with a list of the applications contributing to the requirements to date), a brief history of changes between previous versions of the specification, and summary of the planned changes in up- coming versions of the specification. Appendices walk the user through a more detailed set of examples of applications written in the PARMA front encI APII and provide additional technical details for those the interested reader.

More Details

Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights

Computational Statistics

Pebay, Philippe P.; Terriberry, Timothy T.; Kolla, Hemanth K.; Bennett, Janine C.

Formulas for incremental or parallel computation of second order central moments have long been known, and recent extensions of these formulas to univariate and multivariate moments of arbitrary order have been developed. Such formulas are of key importance in scenarios where incremental results are required and in parallel and distributed systems where communication costs are high. We survey these recent results, and improve them with arbitrary-order, numerically stable one-pass formulas which we further extend with weighted and compound variants. We also develop a generalized correction factor for standard two-pass algorithms that enables the maintenance of accuracy over nearly the full representable range of the input, avoiding the need for extended-precision arithmetic. We then empirically examine algorithm correctness for pairwise update formulas up to order four as well as condition number and relative error bounds for eight different central moment formulas, each up to degree six, to address the trade-offs between numerical accuracy and speed of the various algorithms. Finally, we demonstrate the use of the most elaborate among the above mentioned formulas, with the utilization of the compound moments for a practical large-scale scientific application.

More Details

Local recovery and failure masking for stencil-based applications at extreme scales

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Gamell, Marc; Teranishi, Keita T.; Heroux, Michael A.; Mayo, Jackson M.; Kolla, Hemanth K.; Chen, Jacqueline H.; Parashar, Manish

Application resilience is a key challenge that has to be addressed to realize the exascale vision. Online recovery, even when it involves all processes, can dramatically reduce the overhead of failures as compared to the more traditional approach where the job is terminated and restarted from the last checkpoint. In this paper we explore how local recovery can be used for certain classes of applications to further reduce overheads due to resilience. Specifically we develop programming support and scalable runtime mechanisms to enable online and transparent local recovery for stencil-based parallel applications on current leadership class systems. We also show how multiple independent failures can be masked to effectively reduce the impact on the total time to solution. We integrate these mechanisms with the S3D combustion simulation, and experimentally demonstrate (using the Titan Cray-XK7 system at ORNL) the ability to tolerate high failure rates (i.e., node failures every 5 seconds) with low overhead while sustaining performance, at scales up to 262144 cores.

More Details

Scalable Parallel Distance Field Construction for Large-Scale Applications

IEEE Transactions on Visualization and Computer Graphics

Yu, Hongfeng; Xie, Jinrong; Ma, Kwan L.; Kolla, Hemanth K.; Chen, Jacqueline H.

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.

More Details

ASC ATDM Level 2 Milestone #5325: Asynchronous Many-Task Runtime System Analysis and Assessment for Next Generation Platforms

Baker, Gavin M.; Bettencourt, Matthew T.; Bova, S.W.; franko, ken f.; Gamell, Marc G.; Grant, Ryan E.; Hammond, Simon D.; Hollman, David S.; Knight, Samuel K.; Kolla, Hemanth K.; Lin, Paul L.; Olivier, Stephen O.; Sjaardema, Gregory D.; Slattengren, Nicole L.; Teranishi, Keita T.; Wilke, Jeremiah J.; Bennett, Janine C.; Clay, Robert L.; kale, laxkimant k.; Jain, Nikhil J.; Mikida, Eric M.; Aiken, Alex A.; Bauer, Michael B.; Lee, Wonchan L.; Slaughter, Elliott S.; Treichler, Sean T.; Berzins, Martin B.; Harman, Todd H.; humphreys, alan h.; schmidt, john s.; sunderland, dan s.; Mccormick, Pat M.; gutierrez, samuel g.; shulz, martin s.; Gamblin, Todd G.; Bremer, Peer-Timo B.

Abstract not provided.

ASC ATDM Level 2 Milestone #5325: Asynchronous Many-Task Runtime System Analysis and Assessment for Next Generation Platforms

Baker, Gavin M.; Bettencourt, Matthew T.; Bova, S.W.; franko, ken f.; Gamell, Marc G.; Grant, Ryan E.; Hammond, Simon D.; Hollman, David S.; Knight, Samuel K.; Kolla, Hemanth K.; Lin, Paul L.; Olivier, Stephen O.; Sjaardema, Gregory D.; Slattengren, Nicole L.; Teranishi, Keita T.; Wilke, Jeremiah J.; Bennett, Janine C.; Clay, Robert L.; kale, laxkimant k.; Jain, Nikhil J.; Mikida, Eric M.; Aiken, Alex A.; Bauer, Michael B.; Lee, Wonchan L.; Slaughter, Elliott S.; Treichler, Sean T.; Berzins, Martin B.; Harman, Todd H.; humphreys, alan h.; schmidt, john s.; sunderland, dan s.; Mccormick, Pat M.; gutierrez, samuel g.; shulz, martin s.; Gamblin, Todd G.; Bremer, Peer-Timo B.

This report provides in-depth information and analysis to help create a technical road map for developing next-generation programming models and runtime systems that support Advanced Simulation and Computing (ASC) work- load requirements. The focus herein is on asynchronous many-task (AMT) model and runtime systems, which are of great interest in the context of "Oriascale7 computing, as they hold the promise to address key issues associated with future extreme-scale computer architectures. This report includes a thorough qualitative and quantitative examination of three best-of-class AIM] runtime systems – Charm-++, Legion, and Uintah, all of which are in use as part of the Centers. The studies focus on each of the runtimes' programmability, performance, and mutability. Through the experiments and analysis presented, several overarching Predictive Science Academic Alliance Program II (PSAAP-II) Asc findings emerge. From a performance perspective, AIV runtimes show tremendous potential for addressing extreme- scale challenges. Empirical studies show an AM runtime can mitigate performance heterogeneity inherent to the machine itself and that Message Passing Interface (MP1) and AM11runtimes perform comparably under balanced conditions. From a programmability and mutability perspective however, none of the runtimes in this study are currently ready for use in developing production-ready Sandia ASC applications. The report concludes by recommending a co- design path forward, wherein application, programming model, and runtime system developers work together to define requirements and solutions. Such a requirements-driven co-design approach benefits the community as a whole, with widespread community engagement mitigating risk for both application developers developers. and high-performance computing runtime systein

More Details

Evolving the message passing programming model via a fault-tolerant, object-oriented transport layer

FTXS 2015 - Proceedings of the 2015 Workshop on Fault Tolerance for HPC at eXtreme Scale, Part of HPDC 2015

Wilke, Jeremiah J.; Kolla, Hemanth K.; Teranishi, Keita T.; Hollman, David S.; Bennett, Janine C.; Slattengren, Nicole S.

In this position paper, we argue for improved fault-tolerance of an MPI code by introducing lightweight virtualization into the MPI interface. In particular, we outline key-value store semantics for MPI send/recv calls, thereby creating a far more expressive programming model. The general message passing semantics and imperative style of MPI application codes would remain essentially unchanged. However, the additional expressiblity of the programming model 1) enables the underlying transport layer to handle faulttolerance more transparently to the application developer, and 2) provides an evolutionary code path towards more declarative asynchronous programming models. The core contribution of this paper is an initial implementation of the DHARMA transport layer that provides the new, required functionality to support the MPI key-value store model.

More Details

Exploring failure recovery for stencil-based applications at extreme scales

HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing

Gamell, Marc; Teranishi, Keita T.; Heroux, Michael A.; Mayo, Jackson M.; Kolla, Hemanth K.; Chen, Jacqueline H.; Parashar, Manish

Application resilience is a key challenge that must be ad-dressed in order to realize the exascale vision. Previous work has shown that online recovery, even when done in a global manner (i.e., involving all processes), can dramatically re-duce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. In this paper we suggest going one step further, and explore how local recovery can be used for certain classes of applications to reduce the over-heads due to failures. Specifically we study the feasibility of local recovery for stencil-based parallel applications and we show how multiple independent failures can be masked to effectively reduce the impact on the total time to solution.

More Details

Lessons Learned from Porting the MiniAero Application to Charm++

Hollman, David S.; Hollman, David S.; Bennett, Janine C.; Bennett, Janine C.; Wilke, Jeremiah J.; Wilke, Jeremiah J.; Kolla, Hemanth K.; Kolla, Hemanth K.; Lin, Paul L.; Lin, Paul L.; Slattengren, Nicole S.; Slattengren, Nicole S.; Teranishi, Keita T.; Teranishi, Keita T.; franko, ken f.; franko, ken f.; Jain, Nikhil J.; Jain, Nikhil J.; Mikida, Eric M.; Mikida, Eric M.

Abstract not provided.

Structure of hydrogen-rich transverse jets in a vitiated turbulent flow

Combustion and Flame

Lyra, Sgouria L.; Wilde, Benjamin; Kolla, Hemanth K.; Seitzman, Jerry M.; Lieuwen, Timothy C.; Chen, Jacqueline H.

This paper reports the results of a joint experimental and numerical study of the flow characteristics and flame structure of a hydrogen rich jet injected normal to a turbulent, vitiated crossflow of lean methane combustion products. Simultaneous high-speed stereoscopic PIV and OH PLIF measurements were obtained and analyzed alongside three-dimensional direct numerical simulations of inert and reacting JICF with detailed H2/CO chemistry. Both the experiment and the simulation reveal that, contrary to most previous studies of reacting JICF stabilized in low-to-moderate temperature air crossflow, the present conditions lead to a burner-attached flame that initiates uniformly around the burner edge. Significant asymmetry is observed, however, between the reaction zones located on the windward and leeward sides of the jet, due to the substantially different scalar dissipation rates. The windward reaction zone is much thinner in the near field, while also exhibiting significantly higher local and global heat release than the much broader reaction zone found on the leeward side of the jet. The unsteady dynamics of the windward shear layer, which largely control the important jet/crossflow mixing processes in that region, are explored in order to elucidate the important flow stability implications arising in the inert and reacting JICF. The paper concludes with an analysis of the ignition, flame characteristics, and global structure of the burner-attached flame. Chemical explosive mode analysis (CEMA) shows that the entire windward shear layer, and a large region on the leeward side of the jet, are highly explosive prior to ignition and are dominated by non-premixed flame structures after ignition. The predominantly mixing limited nature of the flow after ignition is examined by computing the Takeno flame index, which shows that ~70% of the heat release occurs in non-premixed regions.

More Details

Effect of fuel composition and differential diffusion on flame stabilization in reacting syngas jets in turbulent cross-flow

Combustion and Flame

Minamoto, Yuki M.; Kolla, Hemanth K.; Grout, Ray W.; Gruber, Andrea; Chen, Jacqueline H.

Three-dimensional direct numerical simulation results of a transverse syngas fuel jet in turbulent cross-flow of air are analyzed to study the influence of varying volume fractions of CO relative to H2 in the fuel composition on the near field flame stabilization. The mean flame stabilizes at a similar location for CO-lean and CO-rich cases despite the trend suggested by their laminar flame speed, which is higher for the CO-lean condition. To identify local mixtures having favorable mixture conditions for flame stabilization, explosive zones are defined using a chemical explosive mode timescale. The explosive zones related to flame stabilization are located in relatively low velocity regions. The explosive zones are characterized by excess hydrogen transported solely by differential diffusion, in the absence of intense turbulent mixing or scalar dissipation rate. The conditional averages show that differential diffusion is negatively correlated with turbulent mixing. Moreover, the local turbulent Reynolds number is insufficient to estimate the magnitude of the differential diffusion effect. Alternatively, the Karlovitz number provides a better indicator of the importance of differential diffusion. A comparison of the variations of differential diffusion, turbulent mixing, heat release rate and probability of encountering explosive zones demonstrates that differential diffusion predominantly plays an important role for mixture preparation and initiation of chemical reactions, closely followed by intense chemical reactions sustained by sufficient downstream turbulent mixing. The mechanism by which differential diffusion contributes to mixture preparation is investigated using the Takeno Flame Index. The mean Flame Index, based on the combined fuel species, shows that the overall extent of premixing is not intense in the upstream regions. However, the Flame Index computed based on individual contribution of H2 or CO species reveals that hydrogen contributes significantly to premixing, particularly in explosive zones in the upstream leeward region, i.e. at the preferred flame stabilization location. Therefore, a small amount of H2 diffuses much faster than CO, creating relatively homogeneous mixture pockets depending on the competition with turbulent mixing. These pockets, together with high H2 reactivity, contribute to stabilizing the flame at a consistent location regardless of the CO concentration in the fuel for the present range of DNS conditions.

More Details

Impact of multi-component diffusion in turbulent combustion using direct numerical simulations

Combustion and Flame

Bruno, Claudio; Sankaran, Vaidyanathan; Kolla, Hemanth K.; Chen, Jacqueline H.

This paper presents the results of DNS of a partially premixed turbulent syngas/air flame at atmospheric pressure. The objective was to assess the importance and possible effects of molecular transport on flame behavior and structure. To this purpose DNS were performed at with two proprietary DNS codes and with three different molecular diffusion transport models: fully multi-component, mixture averaged, and imposing the Lewis number of all species to be unity. Results indicate that At the Reynolds numbers of the simulations (Returb = 600, Re = 8000) choice of molecular diffusion models affects significantly the temperature and concentration fields;Assuming Le = 1 for all species predicts temperatures up to 250 K higher than the physically realistic multi-component model;Faster molecular transport of lighter species changes the local concentration field and affects reaction pathways and chemical kinetics. A possible explanation for these observations is provided in terms of species diffusion velocity that is a strong function of gradients: thus, at sufficiently large Reynolds numbers, gradients and their effects tend to be large. The preliminary conclusion from these simulations seems to indicate molecular diffusion as the third important mechanism active in flames besides convective transport and kinetics. If confirmed by further DNS and measurements, molecular transport in high intensity turbulent flames will have to be realistically modeled to accurately predict emissions (gaseous and particulates) and other combustor performance metrics.

More Details

Three-dimensional topology of turbulent premixed flame interaction

Proceedings of the Combustion Institute

Griffiths, R.A.C.; Chen, J.H.; Kolla, Hemanth K.; Cant, R.S.; Kollmann, W.

The topology of turbulent premixed flames is analysed using data from Direct Numerical Simulation (DNS), with emphasis on the statistical geometry of flame-flame interaction. A general method for obtaining the critical points of line, surface and volume fields is outlined, and the method is applied to isosurfaces of reaction progress variable in a DNS configuration involving a pair of freely-propagating hydrogen-air flames in a field of intense shear-generated turbulence. A complete set of possible flame-interaction topologies is derived using the eigenvalues of the scalar Hessian, and the topologies are parametrised using a pair of shape factors. The frequency of occurrence of each type of topology is evaluated from the DNS dataset for two different Damköhler numbers. Different types of flame-interaction topology are found to be favoured in various regions of the turbulent flame, and the physical significance of each interaction is discussed.

More Details

Extreme-scale viability of collective communication for resilient task scheduling and work stealing

Proceedings of the International Conference on Dependable Systems and Networks

Wilke, Jeremiah J.; Bennett, Janine C.; Kolla, Hemanth K.; Teranishi, Keita T.; Slattengren, Nicole S.; Floren, John F.

Extreme-scale computing will bring significant changes to high performance computing system architectures. In particular, the increased number of system components is creating a need for software to demonstrate 'pervasive parallelism' and resiliency. Asynchronous, many-task programming models show promise in addressing both the scalability and resiliency challenges, however, they introduce an enormously challenging distributed, resilient consistency problem. In this work, we explore the viability of resilient collective communication in task scheduling and work stealing and, through simulation with SST/macro, the performance of these collectives on speculative extreme-scale architectures.

More Details

Structure and stabilization of hydrogen-rich transverse jets in a vitiated turbulent flow

Lyra, Sgouria L.; Kolla, Hemanth K.; Chen, Jacqueline H.; Wilde, B W.; Seitzman, J.S.; Lieuwen, T.C.L.

This paper reports the results of a joint experimental and numerical study of the ow characteristics and flame stabilization of a hydrogen rich jet injected normal to a turbulent, vitiated cross ow of lean methane combustion products. Simultaneous high-speed stereoscopic PIV and OH PLIF measurements were obtained and analyzed alongside three-dimensional direct numerical simulations of inert and reacting JICF with detailed H2/CO chemistry. Both the experiment and the simulation reveal that, contrary to most previous studies of reacting JICF stabilized in low-to-moderate temperature air cross ow, the present conditions lead to an autoigniting, burner-attached flame that initiates uniformly around the burner edge. Significant asymmetry is observed, however, between the reaction zones located on the windward and leeward sides of the jet, due to the substantially different scalar dissipation rates. The windward reaction zone is much thinner in the near field, while also exhibiting significantly higher local and global heat release than the much broader reaction zone found on the leeward side of the jet. The unsteady dynamics of the windward shear layer, which largely control the important jet/cross flow mixing processes in that region, are explored in order to elucidate the important flow stability implications arising in the reacting JICF. Vorticity spectra extracted from the windward shear layer reveal that the reacting jet is globally unstable and features two high frequency peaks, including a fundamental mode whose Strouhal number of ~0.7 agrees well with previous non-reacting JICF stability studies. The paper concludes with an analysis of the ignition, ame stabilization, and global structure of the burner-attached flame. Chemical explosive mode analysis (CEMA) shows that the entire windward shear layer, and a large region on the leeward side of the jet, are highly explosive prior to ignition and are dominated by non-premixed flame structures after ignition. The predominantly mixing limited nature of the flow after ignition is confirmed by computing the Takeno flame index, which shows that ~70% of the heat release occurs in non-premixed regions.

More Details

In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Landge, Aaditya G.; Pascucci, Valerio; Gyulassy, Attila; Bennett, Janine C.; Kolla, Hemanth K.; Chen, Jacqueline H.; Bremer, Peer T.

The ever increasing amount of data generated by scientific simulations coupled with system I/O constraints are fueling a need for in-situ analysis techniques. Of particular interest are approaches that produce reduced data representations while maintaining the ability to redefine, extract, and study features in a post-process to obtain scientific insights. This paper presents two variants of in-situ feature extraction techniques using segmented merge trees, which encode a wide range of threshold based features. The first approach is a fast, low communication cost technique that generates an exact solution but has limited scalability. The second is a scalable, local approximation that nevertheless is guaranteed to correctly extract all features up to a predefined size. We demonstrate both variants using some of the largest combustion simulations available on leadership class supercomputers. Our approach allows state-of-the-art, feature-based analysis to be performed in-situ at significantly higher frequency than currently possible and with negligible impact on the overall simulation runtime.

More Details

Large Eddy Simulation of premixed flame flashback in a turbulent channel

52nd AIAA Aerospace Sciences Meeting - AIAA Science and Technology Forum and Exposition, SciTech 2014

Lietz, C.; Hassanaly, M.; Raman, V.; Kolla, Hemanth K.; Chen, J.; Gruber, A.

In the design of high-hydrogen content gas turbines for power generation, ashback of the turbulent ame by propagation through the low velocity boundary layers in the premix- ing region is an operationally dangerous event. Predictive models that could capture the onset of ashback would be indispensable in gas turbine design. For this purpose, modeling of the ashback process using the large eddy simulation (LES) approach is considered here. In particular, the goal is to understand the modeling requirements for predicting ashback in confined goemetries. The ow configuration considered is a turbulent channel ow, for which high-fidelity direct numerical simulation (DNS) data already exists. A suite of LES calculations with different model formulations and filterwidths is considered. It is shown that LES predicts certain statistical properties of the ame front reasonably well, but fails to capture the propagation velocity accurately. It is found that the ashback process is invariant to changes in the initial conditions and additional near-wall grid refinement but the LES filterwidth as well as the subfilter models are found to be important even when the turbulence is almost fully resolved. From the computations, it is shown that for an LES model to predict ashback, suffcient resolution of the near-wall region, proper represen- tation of the centerline acceleration caused by ame blockage, and appropriate modeling of the propagation of a wrinkled ame front near the center of the channel are considered the critical requirements.

More Details

A direct numerical simulation study of turbulence and flame structure in transverse jets analysed in jet-trajectory based coordinates

Journal of Fluid Mechanics

Grout, R.W.; Gruber, A.; Kolla, Hemanth K.; Bremer, P.T.; Bennett, J.C.; Gyulassy, A.; Chen, J.H.

An H2 N2 jet in cross-flow (JICF) of air is studied using three-dimensional direct numerical simulation with and without chemical reaction in order to investigate the role of the complex JICF turbulent flow field in the mechanism of fast fuel-oxidant mixing and of aerodynamic flame stabilization in the near field of the jet nozzle. Focus is on delineating the flow/mixing/chemistry conditions that are necessary and/or sufficient to achieve flame anchoring that ultimately enables the formulation of more reliable and precise guidelines for design of fuel injection nozzles. A mixture averaged diffusion formulation that includes the effect of thermal diffusion is used along with a detailed chemical kinetics mechanism for hydrogen-air combustion. A new parametrization technique is used to describe the jet trajectory: solution of Laplace's equation upon, and then within, an opportune scalar surface anchored by Dirichlet boundary conditions at the jet nozzle and plume exit from the domain provides a smoothly varying field along the jet path. The surface is selected to describe the scalar mixing and reaction associated with a transverse jet. The derived field, j(x), is used as a condition to mark the position along the natural jet trajectory when analysing the variation of relevant flow, mixing and reaction quantities in the present direct numerical simulation (DNS) datasets. Results indicate the presence of a correlation between the flame base location in parameter space and a region of low velocity magnitude, high enstrophy, high mixing rate and high equivalence ratio (flame root region). Instantaneously, a variety of vortical structures, well known from the literature as important contributors to fuel-oxidant mixing, are observed in both inert and reactive cases with a considerable span in length scales. Moreover, instantaneous plots from reactive cases illustrate that the most upstream flame tongues propagate close to the trailing edge of the fuel jet potential core near the jet shear layer vortex shedding position. Some degree of asymmetry with respect to the domain mid-plane in the spanwise direction is observed in the averaged fields, both for the inert and reactive cases. © 2012 Cambridge University Press.

More Details
127 Results
127 Results