Publications

63 Results
Skip to search filters

Conditioning multi-model ensembles for disease forecasting

Ray, Jaideep R.; Cauthen, Katherine R.; Lefantzi, Sophia L.; Burks, Lynne B.

In this study we investigate how an ensemble of disease models can be conditioned to observational data, in a bid to improve its predictive skill. We use the ensemble of influenza forecasting models gathered by the US Centers for Disease Control and Prevention (CDC) as the exemplar. This ensemble is used every year to forecast the annual influenza outbreak in the United States. The models constituting this ensemble draw on very different modeling assumptions and approximations and are a diverse collection of methods to approximate epidemiological dynamics. Currently, each models' predictions are accorded the same importance, or weight, when compiling the ensemble's forecast. We consider this equally-weighted ensemble as the baseline case which has to be improved upon. In this study, we explore whether an ensemble forecast can be improved by "conditionine the ensemble to whatever observational data is available from the ongoing outbreak. "Conditionine can imply according the ensemble's members different weights which evolve over time, or simply perform the forecast using the top k (equally-weighted) models. In the latter case, the composition of the "top-k-see of models evolves over time. This is called "model averagine in statistics. We explore four methods to perform model-averaging, three of which are new.. We find that the CDC ensemble responds best to the "top-k-models" approach to model-averaging. All the new MA methods perform better than the baseline equally-weighted ensemble. The four model-averaging methods treat the models as black-boxes and simply use their forecasts as inputs i.e., one does not need access to the models at all, but rather only their forecasts. The model-averaging approaches reviewed in this report thus form a general framework for model-averaging any model ensemble.

More Details

Robust Bayesian calibration of a k-ϵ model for compressible jet-in-crossflow simulations

AIAA Journal

Ray, Jaideep R.; DeChant, Lawrence J.; Lefantzi, Sophia L.; Ling, Julia; Arunajatesan, Srinivasan A.

Compressible jet-in-crossflow interactions are difficult to simulate accurately using Reynolds-averaged Navier-Stokes (RANS) models. This could be due to simplifications inherent in RANS or the use of inappropriate RANS constants estimated by fitting to experiments of simple or canonical flows. Our previous work on Bayesian calibration of a k - ϵ model to experimental data had led to a weak hypothesis that inaccurate simulations could be due to inappropriate constants more than model-form inadequacies of RANS. In this work, Bayesian calibration of k - ϵ constants to a set of experiments that span a range of Mach numbers and jet strengths has been performed. The variation of the calibrated constants has been checked to assess the degree to which parametric estimates compensate for RANS's model-form errors. An analytical model of jet-in-crossflow interactions has also been developed, and estimates of k - ϵ constants that are free of any conflation of parametric and RANS's model-form uncertainties have been obtained. It has been found that the analytical k - ϵ constants provide mean-flow predictions that are similar to those provided by the calibrated constants. Further, both of them provide predictions that are far closer to experimental measurements than those computed using "nominal" values of these constants simply obtained from the literature. It can be concluded that the lack of predictive skill of RANS jet-in-crossflow simulations is mostly due to parametric inadequacies, and our analytical estimates may provide a simple way of obtaining predictive compressible jet-in-crossflow simulations.

More Details

Learning an eddy viscosity model using shrinkage and Bayesian calibration: A jet-in-crossflow case study

ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering

Ray, Jaideep R.; Lefantzi, Sophia L.; Arunajatesan, Srinivasan A.; DeChant, Lawrence J.

We demonstrate a statistical procedure for learning a high-order eddy viscosity model (EVM) from experimental data and using it to improve the predictive skill of a Reynoldsaveraged Navier-Stokes (RANS) simulator. The method is tested in a three-dimensional (3D), transonic jet-in-crossflow (JIC) configuration. The process starts with a cubic eddy viscosity model (CEVM) developed for incompressible flows. It is fitted to limited experimental JIC data using shrinkage regression. The shrinkage process removes all the terms from the model, except an intercept, a linear term, and a quadratic one involving the square of the vorticity. The shrunk eddy viscosity model is implemented in an RANS simulator and calibrated, using vorticity measurements, to infer three parameters. The calibration is Bayesian and is solved using a Markov chain Monte Carlo (MCMC) method. A 3D probability density distribution for the inferred parameters is constructed, thus quantifying the uncertainty in the estimate. The phenomenal cost of using a 3D flow simulator inside an MCMC loop is mitigated by using surrogate models ("curve-fits"). A support vector machine classifier (SVMC) is used to impose our prior belief regarding parameter values, specifically to exclude nonphysical parameter combinations. The calibrated model is compared, in terms of its predictive skill, to simulations using uncalibrated linear and CEVMs. We find that the calibrated model, with one quadratic term, is more accurate than the uncalibrated simulator. The model is also checked at a flow condition at which the model was not calibrated.

More Details

K-ε Turbulence Model Parameter Estimates Using an Approximate Self-similar Jet-in-Crossflow Solution

AIAA Journal

DeChant, Lawrence J.; Ray, Jaideep R.; Lefantzi, Sophia L.; Ling, Julia L.; Arunajatesan, Srinivasan A.

The k-ε turbulence model has been described as perhaps “the most widely used complete turbulence model.” This family of heuristic Reynolds Averaged Navier-Stokes (RANS) turbulence closures is supported by a suite of model parameters that have been estimated by demanding the satisfaction of well-established canonical flows such as homogeneous shear flow, log-law behavior, etc. While this procedure does yield a set of so-called nominal parameters, it is abundantly clear that they do not provide a universally satisfactory turbulence model that is capable of simulating complex flows. Recent work on the Bayesian calibration of the k-ε model using jet-in-crossflow wind tunnel data has yielded parameter estimates that are far more predictive than nominal parameter values. In this paper, we develop a self-similar asymptotic solution for axisymmetric jet-in-crossflow interactions and derive analytical estimates of the parameters that were inferred using Bayesian calibration. The self-similar method utilizes a near field approach to estimate the turbulence model parameters while retaining the classical far-field scaling to model flow field quantities. Our parameter values are seen to be far more predictive than the nominal values, as checked using RANS simulations and experimental measurements. They are also closer to the Bayesian estimates than the nominal parameters. A traditional simplified jet trajectory model is explicitly related to the turbulence model parameters and is shown to yield good agreement with measurement when utilizing the analytical derived turbulence model coefficients. Finally, the close agreement between the turbulence model coefficients obtained via Bayesian calibration and the analytically estimated coefficients derived in this paper is consistent with the contention that the Bayesian calibration approach is firmly rooted in the underlying physical description.

More Details

Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems

Parallel Computing

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia L.; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh R.; Stevenson, Joel O.

A detailed understanding of HPC applications’ resource needs and their complex interactions with each other and HPC platform resources are critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because typical monitoring tools do not capture system resource usage information with high enough fidelity to gain sufficient insight into application performance and demands. In this paper we present both system and application profiling results based on data obtained through synchronized system wide monitoring on a production HPC cluster at Sandia National Laboratories (SNL). We demonstrate analytic and visualization techniques that we are using to characterize application and system resource usage under production conditions for better understanding of application resource needs. Our goals are to improve application performance (through understanding application-to-resource mapping and system throughput) and to ensure that future system capabilities match their intended workloads.

More Details

Online mapping and forecasting of epidemics using open-source indicators

Ray, Jaideep R.; Lefantzi, Sophia L.; Bauer, Joshua B.; Khalil, Mohammad K.; Rothfuss, Andrew J.; Cauthen, Katherine R.; Finley, Patrick D.; Smith, Halley S.

Open-source indicators have been proposed as a way of tracking and forecasting disease outbreaks. Some, such are meteorological data, are readily available as reanalysis products. Others, such as those derived from our online behavior (web searches, media article etc.) are gathered easily and are more timely than public health reporting. In this study we investigate how these datastreams may be combined to provide useful epidemiological information. The investigation is performed by building data assimilation systems to track influenza in California and dengue in India. The first does not suffer from incomplete data and was chosen to explore disease modeling needs. The second explores the case when observational data is sparse and disease modeling complexities are beside the point. The two test cases are for opposite ends of the disease tracking spectrum. We find that data assimilation systems that produce disease activity maps can be constructed. Further, being able to combine multiple open-source datastreams is a necessity as any one individually is not very infor- mative. The data assimilation systems have very little in common except that they contain disease models, calibration algorithms and some ability to impute missing data. Thus while the data assimilation systems share the goal for accurate forecasting, they are practically designed to compensate for the shortcomings of the datastreams. Thus we expect them to be disease and location-specific.

More Details

Bayesian parameter estimation of a k-ε model for accurate jet-in-crossflow simulations

AIAA Journal

Ray, Jaideep R.; Lefantzi, Sophia L.; Arunajatesan, Srinivasan A.; DeChant, Lawrence J.

Reynolds-averaged Navier–Stokes models are not very accurate for high-Reynolds-number compressible jet-in-crossflow interactions. The inaccuracy arises from the use of inappropriate model parameters and model-form errors in the Reynolds-averaged Navier–Stokes model. In this study, the hypothesis is pursued that Reynolds-averaged Navier–Stokes predictions can be significantly improved by using parameters inferred from experimental measurements of a supersonic jet interacting with a transonic crossflow.

More Details

Toward rapid understanding of production HPC applications and systems

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia L.; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh R.; Stevenson, Joel O.

A detailed understanding of HPC application's resource needs and their complex interactions with each other and HPC platform resources is critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because typical monitoring tools do not capture system resource usage information with high enough fidelity to gain sufficient insight into application performance and demands. In this paper we present both system and application profiling results based on data obtained through synchronized system wide monitoring on a production HPC cluster at Sandia National Laboratories (SNL). We demonstrate analytic and visualization techniques that we are using to characterize application and system resource usage under production conditions for better understanding of application resource needs. Our goals are to improve application performance (through understanding application-to-resource mapping and system throughput) and to ensure that future system capabilities match their intended workloads.

More Details

A sparse reconstruction method for the estimation of multi-resolution emission fields via atmospheric inversion

Geoscientific Model Development

Ray, J.; Lee, Jina L.; Yadav, V.; Lefantzi, Sophia L.; Michalak, A.M.; van Bloemen Waanders, Bart G.

Atmospheric inversions are frequently used to estimate fluxes of atmospheric greenhouse gases (e.g., biospheric CO2 flux fields) at Earth's surface. These inversions typically assume that flux departures from a prior model are spatially smoothly varying, which are then modeled using a multi-variate Gaussian. When the field being estimated is spatially rough, multi-variate Gaussian models are difficult to construct and a wavelet-based field model may be more suitable. Unfortunately, such models are very high dimensional and are most conveniently used when the estimation method can simultaneously perform data-driven model simplification (removal of model parameters that cannot be reliably estimated) and fitting. Such sparse reconstruction methods are typically not used in atmospheric inversions. In this work, we devise a sparse reconstruction method, and illustrate it in an idealized atmospheric inversion problem for the estimation of fossil fuel CO2 (ffCO2) emissions in the lower 48 states of the USA. Our new method is based on stagewise orthogonal matching pursuit (StOMP), a method used to reconstruct compressively sensed images. Our adaptations bestow three properties to the sparse reconstruction procedure which are useful in atmospheric inversions. We have modified StOMP to incorporate prior information on the emission field being estimated and to enforce non-negativity on the estimated field. Finally, though based on wavelets, our method allows for the estimation of fields in non-rectangular geometries, e.g., emission fields inside geographical and political boundaries. Our idealized inversions use a recently developed multi-resolution (i.e., wavelet-based) random field model developed for ffCO2 emissions and synthetic observations of ffCO2 concentrations from a limited set of measurement sites. We find that our method for limiting the estimated field within an irregularly shaped region is about a factor of 10 faster than conventional approaches. It also reduces the overall computational cost by a factor of 2. Further, the sparse reconstruction scheme imposes non-negativity without introducing strong nonlinearities, such as those introduced by employing log-transformed fields, and thus reaps the benefits of simplicity and computational speed that are characteristic of linear inverse problems.

More Details

Estimation of k-ε parameters using surrogate models and jet-in-crossflow data

Lefantzi, Sophia L.; Ray, Jaideep R.; Arunajatesan, Srinivasan A.; DeChant, Lawrence J.

We demonstrate a Bayesian method that can be used to calibrate computationally expensive 3D RANS (Reynolds Av- eraged Navier Stokes) models with complex response surfaces. Such calibrations, conditioned on experimental data, can yield turbulence model parameters as probability density functions (PDF), concisely capturing the uncertainty in the parameter estimates. Methods such as Markov chain Monte Carlo (MCMC) estimate the PDF by sampling, with each sample requiring a run of the RANS model. Consequently a quick-running surrogate is used instead to the RANS simulator. The surrogate can be very difficult to design if the model's response i.e., the dependence of the calibration variable (the observable) on the parameter being estimated is complex. We show how the training data used to construct the surrogate can be employed to isolate a promising and physically realistic part of the parameter space, within which the response is well-behaved and easily modeled. We design a classifier, based on treed linear models, to model the "well-behaved region". This classifier serves as a prior in a Bayesian calibration study aimed at estimating 3 k - ε parameters ( C μ, C ε2 , C ε1 ) from experimental data of a transonic jet-in-crossflow interaction. The robustness of the calibration is investigated by checking its predictions of variables not included in the cal- ibration data. We also check the limit of applicability of the calibration by testing at off-calibration flow regimes. We find that calibration yield turbulence model parameters which predict the flowfield far better than when the nomi- nal values of the parameters are used. Substantial improvements are still obtained when we use the calibrated RANS model to predict jet-in-crossflow at Mach numbers and jet strengths quite different from those used to generate the ex- perimental (calibration) data. Thus the primary reason for poor predictive skill of RANS, when using nominal values of the turbulence model parameters, was parametric uncertainty, which was rectified by calibration. Post-calibration, the dominant contribution to model inaccuraries are due to the structural errors in RANS.

More Details

Kalman-filtered compressive sensing for high resolution estimation of anthropogenic greenhouse gas emissions from sparse measurements

Ray, Jaideep R.; Lee, Jina L.; Lefantzi, Sophia L.; van Bloemen Waanders, Bart G.

The estimation of fossil-fuel CO2 emissions (ffCO2) from limited ground-based and satellite measurements of CO2 concentrations will form a key component of the monitoring of treaties aimed at the abatement of greenhouse gas emissions. The limited nature of the measured data leads to a severely-underdetermined estimation problem. If the estimation is performed at fine spatial resolutions, it can also be computationally expensive. In order to enable such estimations, advances are needed in the spatial representation of ffCO2 emissions, scalable inversion algorithms and the identification of observables to measure. To that end, we investigate parsimonious spatial parameterizations of ffCO2 emissions which can be used in atmospheric inversions. We devise and test three random field models, based on wavelets, Gaussian kernels and covariance structures derived from easily-observed proxies of human activity. In doing so, we constructed a novel inversion algorithm, based on compressive sensing and sparse reconstruction, to perform the estimation. We also address scalable ensemble Kalman filters as an inversion mechanism and quantify the impact of Gaussian assumptions inherent in them. We find that the assumption does not impact the estimates of mean ffCO2 source strengths appreciably, but a comparison with Markov chain Monte Carlo estimates show significant differences in the variance of the source strengths. Finally, we study if the very different spatial natures of biogenic and ffCO2 emissions can be used to estimate them, in a disaggregated fashion, solely from CO2 concentration measurements, without extra information from products of incomplete combustion e.g., CO. We find that this is possible during the winter months, though the errors can be as large as 50%.

More Details

A multiresolution spatial parametrization for the estimation of fossil-fuel carbon dioxide emissions via atmospheric inversions

Ray, Jaideep R.; Lee, Jina L.; Lefantzi, Sophia L.; van Bloemen Waanders, Bart G.

The estimation of fossil-fuel CO2 emissions (ffCO2) from limited ground-based and satellite measurements of CO2 concentrations will form a key component of the monitoring of treaties aimed at the abatement of greenhouse gas emissions. To that end, we construct a multiresolution spatial parametrization for fossil-fuel CO2 emissions (ffCO2), to be used in atmospheric inversions. Such a parametrization does not currently exist. The parametrization uses wavelets to accurately capture the multiscale, nonstationary nature of ffCO2 emissions and employs proxies of human habitation, e.g., images of lights at night and maps of built-up areas to reduce the dimensionality of the multiresolution parametrization. The parametrization is used in a synthetic data inversion to test its suitability for use in atmospheric inverse problem. This linear inverse problem is predicated on observations of ffCO2 concentrations collected at measurement towers. We adapt a convex optimization technique, commonly used in the reconstruction of compressively sensed images, to perform sparse reconstruction of the time-variant ffCO2 emission field. We also borrow concepts from compressive sensing to impose boundary conditions i.e., to limit ffCO2 emissions within an irregularly shaped region (the United States, in our case). We find that the optimization algorithm performs a data-driven sparsification of the spatial parametrization and retains only of those wavelets whose weights could be estimated from the observations. Further, our method for the imposition of boundary conditions leads to a 10computational saving over conventional means of doing so. We conclude with a discussion of the accuracy of the estimated emissions and the suitability of the spatial parametrization for use in inverse problems with a significant degree of regularization.

More Details

DAKOTA : a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis

Adams, Brian M.; Bohnhoff, William J.; Dalbey, Keith D.; Eddy, John P.; Eldred, Michael S.; Hough, Patricia D.; Lefantzi, Sophia L.; Swiler, Laura P.; Vigil, Dena V.

The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a flexible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a theoretical manual for selected algorithms implemented within the DAKOTA software. It is not intended as a comprehensive theoretical treatment, since a number of existing texts cover general optimization theory, statistical analysis, and other introductory topics. Rather, this manual is intended to summarize a set of DAKOTA-related research publications in the areas of surrogate-based optimization, uncertainty quantification, and optimization under uncertainty that provide the foundation for many of DAKOTA's iterative analysis capabilities.

More Details

Bayesian data assimilation for stochastic multiscale models of transport in porous media

Lefantzi, Sophia L.; Klise, Katherine A.; Salazar, Luke S.; Mckenna, Sean A.; van Bloemen Waanders, Bart G.; Ray, Jaideep R.

We investigate Bayesian techniques that can be used to reconstruct field variables from partial observations. In particular, we target fields that exhibit spatial structures with a large spectrum of lengthscales. Contemporary methods typically describe the field on a grid and estimate structures which can be resolved by it. In contrast, we address the reconstruction of grid-resolved structures as well as estimation of statistical summaries of subgrid structures, which are smaller than the grid resolution. We perform this in two different ways (a) via a physical (phenomenological), parameterized subgrid model that summarizes the impact of the unresolved scales at the coarse level and (b) via multiscale finite elements, where specially designed prolongation and restriction operators establish the interscale link between the same problem defined on a coarse and fine mesh. The estimation problem is posed as a Bayesian inverse problem. Dimensionality reduction is performed by projecting the field to be inferred on a suitable orthogonal basis set, viz. the Karhunen-Loeve expansion of a multiGaussian. We first demonstrate our techniques on the reconstruction of a binary medium consisting of a matrix with embedded inclusions, which are too small to be grid-resolved. The reconstruction is performed using an adaptive Markov chain Monte Carlo method. We find that the posterior distributions of the inferred parameters are approximately Gaussian. We exploit this finding to reconstruct a permeability field with long, but narrow embedded fractures (which are too fine to be grid-resolved) using scalable ensemble Kalman filters; this also allows us to address larger grids. Ensemble Kalman filtering is then used to estimate the values of hydraulic conductivity and specific yield in a model of the High Plains Aquifer in Kansas. Strong conditioning of the spatial structure of the parameters and the non-linear aspects of the water table aquifer create difficulty for the ensemble Kalman filter. We conclude with a demonstration of the use of multiscale stochastic finite elements to reconstruct permeability fields. This method, though computationally intensive, is general and can be used for multiscale inference in cases where a subgrid model cannot be constructed.

More Details

Real-time characterization of partially observed epidemics using surrogate models

Safta, Cosmin S.; Ray, Jaideep R.; Sargsyan, Khachik S.; Lefantzi, Sophia L.

We present a statistical method, predicated on the use of surrogate models, for the 'real-time' characterization of partially observed epidemics. Observations consist of counts of symptomatic patients, diagnosed with the disease, that may be available in the early epoch of an ongoing outbreak. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information on the dynamics of the etiologic agent in the affected population e.g., the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and epidemiological parameters are estimated as distributions using a Markov chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. In some cases, the inverse problem can be computationally expensive, primarily due to the epidemic simulator used inside the inversion algorithm. We present a method, based on replacing the epidemiological model with computationally inexpensive surrogates, that can reduce the computational time to minutes, without a significant loss of accuracy. The surrogates are created by projecting the output of an epidemiological model on a set of polynomial chaos bases; thereafter, computations involving the surrogate model reduce to evaluations of a polynomial. We find that the epidemic characterizations obtained with the surrogate models is very close to that obtained with the original model. We also find that the number of projections required to construct a surrogate model is O(10)-O(10{sup 2}) less than the number of samples required by the MCMC to construct a stationary posterior distribution; thus, depending upon the epidemiological models in question, it may be possible to omit the offline creation and caching of surrogate models, prior to their use in an inverse problem. The technique is demonstrated on synthetic data as well as observations from the 1918 influenza pandemic collected at Camp Custer, Michigan.

More Details
63 Results
63 Results