Virtual mini-workshop: Statistics
Topics in Multiscale, Forecast Verification, and Data Assimilation
Revisiting SIAM-UQ 2020
June 7-10, 2021
Organizers:
Julie Bessac (Argonne National Lab., US)
Jochen Broecker (University of Reading, UK)
Emil Constantinescu (Argonne National Lab., US)
A key challenge associated with simulations and predictions of complex systems is to evaluate the quality of these datasets and the ability of the underlying model to reproduce physically relevant simulations. In statistics one way to quantitatively evaluate and rank models is statistical scoring. This is typically based on scalar metrics and takes as input verification data and output from the model to be evaluated. While evaluating model simulations or predictions, one aims to detect bias, trends, outliers, or correlation misspecification. Methods to evaluate the quality of unidimensional outputs are well established to a certain extent. Additionally, the evaluation of multidimensional outputs or ensemble of outputs has been addressed in the literature relatively recently and remains challenging. We will discuss these challenges associated with evaluating unidimensional and multidimensional simulations or predictions.
Analysis and modeling under uncertainty are increasingly critical for robust scientific simulations. Physics-based model simulations cannot resolve the mathematical model exactly, typically leaving out fine scales, which are either approximated or not represented. This results in uncertainties in their outputs that need to be characterized. A variety of stochastic methods have been developed to address these errors and uncertainty to better describe complex systems. We discuss new developments in sub-grid stochastic models, multiscale aspects, model reduction techniques, and the effect they have on Bayesian inversion and data assimilation applications.
Start time: 8am CT/ 2pm UK/ 3pm CET/ 5pm Kaust
End time: 10.30am CT/ 4.30pm UK/ 5.30pm CET/ 7.30pm Kaust
CT |
8.15 - 9.00am |
9.00 - 9.45am |
9.45 - 10.30am |
UK |
2.15 - 3.00pm |
3.00 - 3.45pm |
3.45 - 4.30pm |
CET |
3.15 - 4.00pm |
4.00 - 4.45pm |
4.45 - 5.30pm |
KAUST |
4.15 - 5.00pm |
5.00 - 5.45pm |
5.45 - 6.30pm |
|
|
|
|
Mon, 7/6 |
Etienne Mémin |
Hannah Christensen |
Julie Bessac |
Tue, 8/6 |
Discussion |
Jochen Broecker |
Emil Constantinescu |
Thu, 10/6 |
Sebastian Buschow & Petra Friederichs |
David Bolin |
Discussion |
Julie Bessac (Argonne National Lab., US): Scale-aware statistical space-time characterization of sub-grid air-sea exchange variability
We present a statistical scale-aware space-time model for the sub-grid variability of air-sea exchanges driven by surface wind speed. Quantifying the influence of the sub-grid scales on the resolved scales in physics-based models is needed to better represent the entire system. In this work, we evaluate and model the difference between the true turbulent fluxes and those calculated using area-averaged wind speeds. This discrepancy is modelled in space and time, conditioned on the low-resolution fields, with the view of developing a stochastic wind-flux parameterization. A locally stationary space-time Gaussian process is used to model this discrepancy process. Additionally, the Gaussian process is proposed in a scale-aware fashion meaning that the space-time correlation ranges depend on the considered resolution. The scale-aware capability is based on empirical observations from a systematic coarse-graining of a high-resolution model output dataset. It enables to derive a stochastic parameterization of sub-grid variability at any resolution and to characterize statistically the space-time structure of the discrepancy process across scales.
David Bolin (KAUST, Saudi Arabia): Local scale invariance and robustness of proper scoring rules
Averages of proper scoring rules are often used to rank probabilistic forecasts. In many cases, the variance of the individual observations and their predictive distributions vary in these averages. We show that some of the most popular proper scoring rules, such as the continuous ranked probability score (CRPS) which is the go-to score for continuous observation ensemble forecasts, give more importance to observations with large uncertainty which can lead to unintuitive rankings. To describe this issue, we define the concept of local scale invariance for scoring rules. A new class of generalized proper kernel scoring rules is derived and as a member of this class we propose the scaled CRPS (SCRPS). This new proper scoring rule is locally scale invariant and therefore works in the case of varying uncertainty. Like CRPS it is computationally available for output from ensemble forecasts, and does not require ability to evaluate the density of the forecast. We further define robustness of scoring rules, show why this also is an important concept for average scores, and derive new proper scoring rules that are robust against outliers.
Jochen Broecker (University of Reading, UK): Evaluating reliability of forecasting systems under serial correlation
A general problem in the statistical evaluation of forecast performance is the intertemporal correlation of the verification-forecast pairs. As an example consider the rank histogram, a popular tool to assess the reliability of ensemble forecasting systems (that is, whether the ensembles can in fact be regarded as sampled from the relevant conditional distributions). If the system is reliable, the ranks are uniformly distributed, but they are not independent, so standard goodness-of-fit tests cannot be applied. On the other hand, assuming the forecasting system is reliable, the forecasts should, per definition, provide information about the correlation between the verification and themselves. This information is typically sufficient to formulate tests for reliability and determine the asymptotic distribution of the test statistic under minimal extraneous assumptions. Stratified rank histograms are an example that will be presented in detail.
Sebastian Buschow and Petra Friederichs (University of Bonn, Germany): Spatial Verification with Wavelets
This study demonstrates how wavelets can extract specific information about the scale-structure, directedness and preferred orientation of two fields being compared. The result is a series of scores that translate the abstract information resulting from the wavelet transform into robust, easily interpretable statements about the realism of the simulated correlation structure. Directional aspects, especially when patterns are too linear, too round, or oriented at the wrong angle, are not explicitly addressed by most existing verification tools. In addition, it is shown how the wavelets' localized nature can be exploited to visualize the local correlation structure on a map, quantify spatially varying displacement errors, or correct structural errors in a simple post-processing algorithm. Unlike other popular approaches in the literature, the novel techniques are not limited to the special case of precipitation verification. Provided that observations exist on a regular grid, wavelet-based scores can in principle be applied to any meteorological field of interest.
Hannah Christensen (Oxford, UK): Lessons learnt from coarse graining high-resolution simulations to constrain stochastic parametrisations
Stochastic parametrisations are used in weather and climate models to represent model error. Designing new stochastic schemes has been the target of much innovative research over the last decade, with a focus on developing physically motivated schemes. An attractive approach to constraining stochastic schemes is to make use of high-resolution simulations to measure the variability that is unresolved at lower resolutions. A coarse-graining approach can be used to filter the resolved from unresolved scales. But this is easier said than done! In this talk I’ll present an approach for deriving or constraining stochastic parametrisations using coarse-graining, but will flag up difficulties encountered, including: coarse graining over topography, grid-scale noise, model drifts and spin-up, and conservation properties. The goal is to foster discussion around best practices in this area, to enable the community to optimally leverage high-resolution simulations for forecast model development.
Emil Constantinescu (Argonne National Lab., US): Reformulating Stochastic Inverse Problems Constrained by Differential Equations by Using Scoring Rules
A key challenge associated with stochastic inverse problems is to evaluate the difference between observational datasets and the distribution of model simulations (i.e., the likelihood function). One way to quantitatively evaluate and rank models or different parameter combinations for a model is statistical scoring, which is typically based on scalar metrics that take as input observational data and the output distribution from the model to be evaluated. We will discuss these challenges in the context of solving stochastic inverse problems driven by differential equations, where we express the inverse problem objective by finding the closest forward distribution that best explains the distribution of the observations.
Etienne Mémin (Inria, CNRS, Irstea, Université de Rennes I, IRMAR, France): Stochastic modelling of large-scale fluid flows
In this talk, I will describe a formalism, called modelling under location uncertainty (LU), to derive in a systematic way large-scale stochastic representations of fluid flows dynamics. This modelling enables to take into account in the evolution laws the neglected small-scale effects through the introduction of a random field.
The resulting dynamics is built from a stochastic representation of the Reynolds transport theorem. This formalism enables, in the very same way as in the deterministic case, a physically relevant derivation (i.e. from the usual conservation laws) of the sought evolution laws. We will in particular show how to derive systematically stochastic representation of flow dynamics. We will give several examples of simulations obtained by such system and how an ensemble of such realizations can be used in data assimilation or for uncertainty quantification.
Furthermore, this formalism brings into play very meaningful terms for turbulence modeling. As a matter of fact, it provides (i) a natural subgrid tensor expression figuring the mixing of the resolved components by the unresolved components; (ii) a multiplicative random term associated to an energy backscattering; and (iii) a modified advection that depicts a so-called turbophoresis phenomena that tends to drive fluid particles from regions of high turbulence toward areas of lower turbulent kinetic energy. We will in particular focus on this last term and show its relevance to describe several physical situations (such as wall-law velocity profiles or wave mean-current interaction and the apparition of the so-called vortex force). This will put an emphasis on the importance of the unresolved components inhomogeneity modeling.