Publications

Results 151–197 of 197
Skip to search filters

Efficient uncertainty quantification methodologies for high-dimensional climate land models

Sargsyan, Khachik S.; Safta, Cosmin S.; Berry, Robert D.; Ray, Jaideep R.; Debusschere, Bert D.; Najm, H.N.

In this report, we proposed, examined and implemented approaches for performing efficient uncertainty quantification (UQ) in climate land models. Specifically, we applied Bayesian compressive sensing framework to a polynomial chaos spectral expansions, enhanced it with an iterative algorithm of basis reduction, and investigated the results on test models as well as on the community land model (CLM). Furthermore, we discussed construction of efficient quadrature rules for forward propagation of uncertainties from high-dimensional, constrained input space to output quantities of interest. The work lays grounds for efficient forward UQ for high-dimensional, strongly non-linear and computationally costly climate models. Moreover, to investigate parameter inference approaches, we have applied two variants of the Markov chain Monte Carlo (MCMC) method to a soil moisture dynamics submodel of the CLM. The evaluation of these algorithms gave us a good foundation for further building out the Bayesian calibration framework towards the goal of robust component-wise calibration.

More Details

Bayesian data assimilation for stochastic multiscale models of transport in porous media

Lefantzi, Sophia L.; Klise, Katherine A.; Salazar, Luke S.; Mckenna, Sean A.; van Bloemen Waanders, Bart G.; Ray, Jaideep R.

We investigate Bayesian techniques that can be used to reconstruct field variables from partial observations. In particular, we target fields that exhibit spatial structures with a large spectrum of lengthscales. Contemporary methods typically describe the field on a grid and estimate structures which can be resolved by it. In contrast, we address the reconstruction of grid-resolved structures as well as estimation of statistical summaries of subgrid structures, which are smaller than the grid resolution. We perform this in two different ways (a) via a physical (phenomenological), parameterized subgrid model that summarizes the impact of the unresolved scales at the coarse level and (b) via multiscale finite elements, where specially designed prolongation and restriction operators establish the interscale link between the same problem defined on a coarse and fine mesh. The estimation problem is posed as a Bayesian inverse problem. Dimensionality reduction is performed by projecting the field to be inferred on a suitable orthogonal basis set, viz. the Karhunen-Loeve expansion of a multiGaussian. We first demonstrate our techniques on the reconstruction of a binary medium consisting of a matrix with embedded inclusions, which are too small to be grid-resolved. The reconstruction is performed using an adaptive Markov chain Monte Carlo method. We find that the posterior distributions of the inferred parameters are approximately Gaussian. We exploit this finding to reconstruct a permeability field with long, but narrow embedded fractures (which are too fine to be grid-resolved) using scalable ensemble Kalman filters; this also allows us to address larger grids. Ensemble Kalman filtering is then used to estimate the values of hydraulic conductivity and specific yield in a model of the High Plains Aquifer in Kansas. Strong conditioning of the spatial structure of the parameters and the non-linear aspects of the water table aquifer create difficulty for the ensemble Kalman filter. We conclude with a demonstration of the use of multiscale stochastic finite elements to reconstruct permeability fields. This method, though computationally intensive, is general and can be used for multiscale inference in cases where a subgrid model cannot be constructed.

More Details

Deriving a model for influenza epidemics from historical data

Ray, Jaideep R.

In this report we describe how we create a model for influenza epidemics from historical data collected from both civilian and military societies. We derive the model when the population of the society is unknown but the size of the epidemic is known. Our interest lies in estimating a time-dependent infection rate to within a multiplicative constant. The model form fitted is chosen for its similarity to published models for HIV and plague, enabling application of Bayesian techniques to discriminate among infectious agents during an emerging epidemic. We have developed models for the progression of influenza in human populations. The model is framed as a integral, and predicts the number of people who exhibit symptoms and seek care over a given time-period. The start and end of the time period form the limits of integration. The disease progression model, in turn, contains parameterized models for the incubation period and a time-dependent infection rate. The incubation period model is obtained from literature, and the parameters of the infection rate are fitted from historical data including both military and civilian populations. The calibrated infection rate models display a marked difference in which the 1918 Spanish Influenza pandemic differed from the influenza seasons in the US between 2001-2008 and the progression of H1N1 in Catalunya, Spain. The data for the 1918 pandemic was obtained from military populations, while the rest are country-wide or province-wide data from the twenty-first century. We see that the initial growth of infection in all cases were about the same; however, military populations were able to control the epidemic much faster i.e., the decay of the infection-rate curve is much higher. It is not clear whether this was because of the much higher level of organization present in a military society or the seriousness with which the 1918 pandemic was addressed. Each outbreak to which the influenza model was fitted yields a separate set of parameter values. We suggest 'consensus' parameter values for military and civilian populations in the form of normal distributions so that they may be further used in other applications. Representing the parameter values as distributions, instead of point values, allows us to capture the uncertainty and scatter in the parameters. Quantifying the uncertainty allows us to use these models further in inverse problems, predictions under uncertainty and various other studies involving risk.

More Details

Real-time characterization of partially observed epidemics using surrogate models

Safta, Cosmin S.; Ray, Jaideep R.; Sargsyan, Khachik S.; Lefantzi, Sophia L.

We present a statistical method, predicated on the use of surrogate models, for the 'real-time' characterization of partially observed epidemics. Observations consist of counts of symptomatic patients, diagnosed with the disease, that may be available in the early epoch of an ongoing outbreak. Characterization, in this context, refers to estimation of epidemiological parameters that can be used to provide short-term forecasts of the ongoing epidemic, as well as to provide gross information on the dynamics of the etiologic agent in the affected population e.g., the time-dependent infection rate. The characterization problem is formulated as a Bayesian inverse problem, and epidemiological parameters are estimated as distributions using a Markov chain Monte Carlo (MCMC) method, thus quantifying the uncertainty in the estimates. In some cases, the inverse problem can be computationally expensive, primarily due to the epidemic simulator used inside the inversion algorithm. We present a method, based on replacing the epidemiological model with computationally inexpensive surrogates, that can reduce the computational time to minutes, without a significant loss of accuracy. The surrogates are created by projecting the output of an epidemiological model on a set of polynomial chaos bases; thereafter, computations involving the surrogate model reduce to evaluations of a polynomial. We find that the epidemic characterizations obtained with the surrogate models is very close to that obtained with the original model. We also find that the number of projections required to construct a surrogate model is O(10)-O(10{sup 2}) less than the number of samples required by the MCMC to construct a stationary posterior distribution; thus, depending upon the epidemiological models in question, it may be possible to omit the offline creation and caching of surrogate models, prior to their use in an inverse problem. The technique is demonstrated on synthetic data as well as observations from the 1918 influenza pandemic collected at Camp Custer, Michigan.

More Details

Truncated multiGaussian fields and effective conductance of binary media

Advances in Water Resources

Mckenna, Sean A.; Ray, Jaideep R.; Marzouk, Youssef; van Bloemen Waanders, Bart G.

Truncated Gaussian fields provide a flexible model for defining binary media with dispersed (as opposed to layered) inclusions. General properties of excursion sets on these truncated fields are coupled with a distance-based upscaling algorithm and approximations of point process theory to develop an estimation approach for effective conductivity in two-dimensions. Estimation of effective conductivity is derived directly from knowledge of the kernel size used to create the multiGaussian field, defined as the full-width at half maximum (FWHM), the truncation threshold and conductance values of the two modes. Therefore, instantiation of the multiGaussian field is not necessary for estimation of the effective conductance. The critical component of the effective medium approximation developed here is the mean distance between high conductivity inclusions. This mean distance is characterized as a function of the FWHM, the truncation threshold and the ratio of the two modal conductivities. Sensitivity of the resulting effective conductivity to this mean distance is examined for two levels of contrast in the modal conductances and different FWHM sizes. Results demonstrate that the FWHM is a robust measure of mean travel distance in the background medium. The resulting effective conductivities are accurate when compared to numerical results and results obtained from effective media theory, distance-based upscaling and numerical simulation. © 2011 Elsevier Ltd.

More Details

The effect of error models in the multiscale inversion of binary permeability fields

Ray, Jaideep R.; van Bloemen Waanders, Bart G.; Mckenna, Sean A.

We present results from a recently developed multiscale inversion technique for binary media, with emphasis on the effect of subgrid model errors on the inversion. Binary media are a useful fine-scale representation of heterogeneous porous media. Averaged properties of the binary field representations can be used to characterize flow through the porous medium at the macroscale. Both direct measurements of the averaged properties and upscaling are complicated and may not provide accurate results. However, it may be possible to infer upscaled properties of the binary medium from indirect measurements at the coarse scale. Multiscale inversion, performed with a subgrid model to connect disparate scales together, can also yield information on the fine-scale properties. We model the binary medium using truncated Gaussian fields, and develop a subgrid model for the upscaled permeability based on excursion sets of those fields. The subgrid model requires an estimate of the proportion of inclusions at the block scale as well as some geometrical parameters of the inclusions as inputs, and predicts the effective permeability. The inclusion proportion is assumed to be spatially varying, modeled using Gaussian processes and represented using a truncated Karhunen-Louve (KL) expansion. This expansion is used, along with the subgrid model, to pose as a Bayesian inverse problem for the KL weights and the geometrical parameters of the inclusions. The model error is represented in two different ways: (1) as a homoscedastic error and (2) as a heteroscedastic error, dependent on inclusion proportionality and geometry. The error models impact the form of the likelihood function in the expression for the posterior density of the objects of inference. The problem is solved using an adaptive Markov Chain Monte Carlo method, and joint posterior distributions are developed for the KL weights and inclusion geometry. Effective permeabilities and tracer breakthrough times at a few 'sensor' locations (obtained by simulating a pump test) form the observables used in the inversion. The inferred quantities can be used to generate an ensemble of permeability fields, both upscaled and fine-scale, which are consistent with the observations. We compare the inferences developed using the two error models, in terms of the KL weights and fine-scale realizations that could be supported by the coarse-scale inferences. Permeability differences are observed mainly in regions where the inclusions proportion is near the percolation threshold, and the subgrid model incurs its largest approximation. These differences also reflected in the tracer breakthrough times and the geometry of flow streamlines, as obtained from a permeameter simulation. The uncertainty due to subgrid model error is also compared to the uncertainty in the inversion due to incomplete data.

More Details

Posterior predictive modeling using multi-scale stochastic inverse parameter estimates

Mckenna, Sean A.; Ray, Jaideep R.; van Bloemen Waanders, Bart G.

Multi-scale binary permeability field estimation from static and dynamic data is completed using Markov Chain Monte Carlo (MCMC) sampling. The binary permeability field is defined as high permeability inclusions within a lower permeability matrix. Static data are obtained as measurements of permeability with support consistent to the coarse scale discretization. Dynamic data are advective travel times along streamlines calculated through a fine-scale field and averaged for each observation point at the coarse scale. Parameters estimated at the coarse scale (30 x 20 grid) are the spatially varying proportion of the high permeability phase and the inclusion length and aspect ratio of the high permeability inclusions. From the non-parametric, posterior distributions estimated for these parameters, a recently developed sub-grid algorithm is employed to create an ensemble of realizations representing the fine-scale (3000 x 2000), binary permeability field. Each fine-scale ensemble member is instantiated by convolution of an uncorrelated multiGaussian random field with a Gaussian kernel defined by the estimated inclusion length and aspect ratio. Since the multiGaussian random field is itself a realization of a stochastic process, the procedure for generating fine-scale binary permeability field realizations is also stochastic. Two different methods are hypothesized to perform posterior predictive tests. Different mechanisms for combining multi Gaussian random fields with kernels defined from the MCMC sampling are examined. Posterior predictive accuracy of the estimated parameters is assessed against a simulated ground truth for predictions at both the coarse scale (effective permeabilities) and at the fine scale (advective travel time distributions). The two techniques for conducting posterior predictive tests are compared by their ability to recover the static and dynamic data. The skill of the inference and the method for generating fine-scale binary permeability fields are evaluated through flow calculations on the resulting fields using fine-scale realizations and comparing them against results obtained with the ground truth fine-scale and coarse-scale permeability fields.

More Details

Statistical techniques for the characterization of partially observed epidemics

Ray, Jaideep R.; Safta, Cosmin S.

Techniques appear promising to construct and integrate automated detect-and-characterize technique for epidemics - Working off biosurveillance data, and provides information on the particular/ongoing outbreak. Potential use - in crisis management and planning, resource allocation - Parameter estimation capability ideal for providing the input parameters into an agent-based model, Index Cases, Time of Infection, infection rate. Non-communicable diseases are easier than communicable ones - Small anthrax can be characterized well with 7-10 days of data, post-detection; plague takes longer, Large attacks are very easy.

More Details

Bayesian classification of partially observed outbreaks using time-series data

Safta, Cosmin S.; Ray, Jaideep R.

Results show that a time-series based classification may be possible. For the test cases considered, the correct model can be selected and the number of index case can be captured within {+-} {sigma} with 5-10 days of data. The low signal-to-noise ratio makes the classification difficult for small epidemics. The problem statement is: (1) Create Bayesian techniques to classify and characterize epidemics from a time-series of ICD-9 codes (will call this time-series a 'morbidity stream'); and (2) It is assumed the morbidity stream has already set off an alarm (through a Kalman filter anomaly detector) Starting with a set of putative diseases: Identify which disease or set of diseases 'fit the data best' and, Infer associated information about it, i.e. number of index cases, start time of the epidemic, spread rate, etc.

More Details

A Bayesian method for inferring transmission chains in a partially observed epidemic

Ray, Jaideep R.; Marzouk, Youssef M.

We present a Bayesian approach for estimating transmission chains and rates in the Abakaliki smallpox epidemic of 1967. The epidemic affected 30 individuals in a community of 74; only the dates of appearance of symptoms were recorded. Our model assumes stochastic transmission of the infections over a social network. Distinct binomial random graphs model intra- and inter-compound social connections, while disease transmission over each link is treated as a Poisson process. Link probabilities and rate parameters are objects of inference. Dates of infection and recovery comprise the remaining unknowns. Distributions for smallpox incubation and recovery periods are obtained from historical data. Using Markov chain Monte Carlo, we explore the joint posterior distribution of the scalar parameters and provide an expected connectivity pattern for the social graph and infection pathway.

More Details

Parallel computing in enterprise modeling

Heath, Zach H.; Shneider, Max S.; Vanderveen, Keith V.; Allan, Benjamin A.; Ray, Jaideep R.

This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

More Details

An improved bi-level algorithm for partitioning dynamic grid hierarchies

Ray, Jaideep R.

Structured adaptive mesh refinement methods are being widely used for computer simulations of various physical phenomena. Parallel implementations potentially offer realistic simulations of complex three-dimensional applications. But achieving good scalability for large-scale applications is non-trivial. Performance is limited by the partitioner's ability to efficiently use the underlying parallel computer's resources. Designed on sound SAMR principles, Nature+Fable is a hybrid, dedicated SAMR partitioning tool that brings together the advantages of both domain-based and patch-based techniques while avoiding their drawbacks. But the original bi-level partitioning approach in Nature+Fable is insufficient as it for realistic applications regards frequently occurring bi-levels as ''impossible'' and fails. This document describes an improved bi-level partitioning algorithm that successfully copes with all possible bi-levels. The improved algorithm uses the original approach side-by-side with a new, complementing approach. By using a new, customized classification method, the improved algorithm switches automatically between the two approaches. This document describes the algorithms, discusses implementation issues, and presents experimental results. The improved version of Nature+Fable was found to be able to handle realistic applications and also to generate less imbalances, similar box count, but more communication as compared to the native, domain-based partitioner in the SAMR framework AMROC.

More Details

An improved bi-level algorithm for partitioning dynamic structured grid hierarchies

Ray, Jaideep R.; Steensland, Johan S.

Structured adaptive mesh refinement methods are being widely used for computer simulations of various physical phenomena. Parallel implementations potentially offer realistic simulations of complex three-dimensional applications. But achieving good scalability for large-scale applications is non-trivial. Performance is limited by the partitioner's ability to efficiently use the underlying parallel computer's resources. Designed on sound SAMR principles, Nature+Fable is a hybrid, dedicated SAMR partitioning tool that brings together the advantages of both domain-based and patch-based techniques while avoiding their drawbacks. But the original bi-level partitioning approach in Nature+Fable is insufficient as it for realistic applications regards frequently occurring bi-levels as 'impossible' and fails. This document describes an improved bi-level partitioning algorithm that successfully copes with all possible hi-levels. The improved algorithm uses the original approach side-by-side with a new, complementing approach. By using a new, customized classification method, the improved algorithm switches automatically between the two approaches. This document describes the algorithms, discusses implementation issues, and presents experimental results. The improved version of Nature+Fable was found to be able to handle realistic applications and also to generate less imbalances, similar box count, but more communication as compared to the native, domain-based partitioner in the SAMR framework AMROC.

More Details
Results 151–197 of 197
Results 151–197 of 197