In this paper we study the efficacy of combining machine-learning methods with projection-based model reduction techniques for creating data-driven surrogate models of computationally expensive, high-fidelity physics models. Such surrogate models are essential for many-query applications e.g., engineering design optimization and parameter estimation, where it is necessary to invoke the high-fidelity model sequentially, many times. Surrogate models are usually constructed for individual scalar quantities. However there are scenarios where a spatially varying field needs to be modeled as a function of the model’s input parameters. Here we develop a method to do so, using projections to represent spatial variability while a machine-learned model captures the dependence of the model’s response on the inputs. The method is demonstrated on modeling the heat flux and pressure on the surface of the HIFiRE-1 geometry in a Mach 7.16 turbulent flow. The surrogate model is then used to perform Bayesian estimation of freestream conditions and parameters of the SST (Shear Stress Transport) turbulence model embedded in the high-fidelity (Reynolds-Averaged Navier–Stokes) flow simulator, using shock-tunnel data. The paper provides the first-ever Bayesian calibration of a turbulence model for complex hypersonic turbulent flows. We find that the primary issues in estimating the SST model parameters are the limited information content of the heat flux and pressure measurements and the large model-form error encountered in a certain part of the flow.
The capability to identify emergent technologies based upon easily accessed open-source indicators, such as publications, is important for decision-makers in industry and government. The scientific contribution of this work is the proposition of a machine learning approach to the detection of the maturity of emerging technologies based on publication counts. Time-series of publication counts have universal features that distinguish emerging and growing technologies. We train an artificial neural network classifier, a supervised machine learning algorithm, upon these features to predict the maturity (emergent vs. growth) of an arbitrary technology. With a training set comprised of 22 technologies we obtain a classification accuracy ranging from 58.3% to 100% with an average accuracy of 84.6% for six test technologies. To enhance classifier performance, we augmented the training corpus with synthetic time-series technology life cycle curves, formed by calculating weighted averages of curves in the original training set. Training the classifier on the synthetic data set resulted in improved accuracy, ranging from 83.3% to 100% with an average accuracy of 90.4% for the test technologies. The performance of our classifier exceeds that of competing machine learning approaches in the literature, which report an average classification accuracy of only 85.7% at maximum. Moreover, in contrast to current methods our approach does not require subject matter expertise to generate training labels, and it can be automated and scaled.
Previous efforts determined a set of calibrated, optimal model parameter values for Reynolds-averaged Navier–Stokes (RANS) simulations of a compressible jet in crossflow (JIC) using a $k–ε$ turbulence model. These parameters were derived by comparing simulation results to particle image velocimetry (PIV) data of a complementary JIC experiment under a limited set of flow conditions. Here, a $k–ε$ model using both nominal and calibrated parameters is validated against PIV data acquired from a much wider variety of JIC cases, including a realistic flight vehicle. The results from the simulations using the calibrated model parameters showed considerable improvements over those using the nominal values, even for cases that were not used in the calibration procedure that defined the optimal parameters. This improvement is demonstrated using a number of quality metrics that test the spatial alignment of the jet core, the magnitudes of multiple flow variables, and the location and strengths of vortices in the counter-rotating vortex cores on the PIV planes. These results suggest that the calibrated parameters have applicability well outside the specific flow case used in defining them and that with the right model parameters, RANS solutions for the JIC can be improved significantly over those obtained from the nominal model.
We present a simple, near-real-time Bayesian method to infer and forecast a multiwave outbreak, and demonstrate it on the COVID-19 pandemic. The approach uses timely epidemiological data that has been widely available for COVID-19. It provides short-term forecasts of the outbreak’s evolution, which can then be used for medical resource planning. The method postulates one- and multiwave infection models, which are convolved with the incubation-period distribution to yield competing disease models. The disease models’ parameters are estimated via Markov chain Monte Carlo sampling and information-theoretic criteria are used to select between them for use in forecasting. The method is demonstrated on two- and three-wave COVID-19 outbreaks in California, New Mexico and Florida, as observed during Summer-Winter 2020. We find that the method is robust to noise, provides useful forecasts (along with uncertainty bounds) and that it reliably detected when the initial single-wave COVID-19 outbreaks transformed into successive surges as containment efforts in these states failed by the end of Spring 2020.
Machine-learned models, specifically neural networks, are increasingly used as “closures” or “constitutive models” in engineering simulators to represent fine-scale physical phenomena that are too computationally expensive to resolve explicitly. However, these neural net models of unresolved physical phenomena tend to fail unpredictably and are therefore not used in mission-critical simulations. In this report, we describe new methods to authenticate them, i.e., to determine the (physical) information content of their training datasets, qualify the scenarios where they may be used and to verify that the neural net, as trained, adhere to physics theory. We demonstrate these methods with neural net closure of turbulent phenomena used in Reynolds Averaged Navier-Stokes equations. We show the types of turbulent physics extant in our training datasets, and, using a test flow of an impinging jet, identify the exact locations where the neural network would be extrapolating i.e., where it would be used outside the feature-space where it was trained. Using Generalized Linear Mixed Models, we also generate explanations of the neural net (à la Local Interpretable Model agnostic Explanations) at prototypes placed in the training data and compare them with approximate analytical models from turbulence theory. Finally, we verify our findings by reproducing them using two different methods.
In this paper we investigate the utility of one-dimensional convolutional neural network (CNN) models in epidemiological forecasting. Deep learning models, in particular variants of recurrent neural networks (RNNs) have been studied for ILI (Influenza-Like Illness) forecasting, and have achieved a higher forecasting skill compared to conventional models such as ARIMA. In this study, we adapt two neural networks that employ one-dimensional temporal convolutional layers as a primary building block—temporal convolutional networks and simple neural attentive meta-learners—for epidemiological forecasting. We then test them with influenza data from the US collected over 2010-2019. We find that epidemiological forecasting with CNNs is feasible, and their forecasting skill is comparable to, and at times, superior to, plain RNNs. Thus CNNs and RNNs bring the power of nonlinear transformations to purely data-driven epidemiological models, a capability that heretofore has been limited to more elaborate mechanistic/compartmental disease models.
For digital twins (DTs) to become a central fixture in mission critical systems, a better understanding is required of potential modes of failure, quantification of uncertainty, and the ability to explain a model’s behavior. These aspects are particularly important as the performance of a digital twin will evolve during model development and deployment for real-world operations.
We demonstrate a Bayesian method for the “real-time” characterization and forecasting of partially observed COVID-19 epidemic. Characterization is the estimation of infection spread parameters using daily counts of symptomatic patients. The method is designed to help guide medical resource allocation in the early epoch of the outbreak. The estimation problem is posed as one of Bayesian inference and solved using a Markov chain Monte Carlo technique. The data used in this study was sourced before the arrival of the second wave of infection in July 2020. The proposed modeling approach, when applied at the country level, generally provides accurate forecasts at the regional, state and country level. The epidemiological model detected the flattening of the curve in California, after public health measures were instituted. The method also detected different disease dynamics when applied to specific regions of New Mexico.