Quick Start

Configure Existing Experiment

Procedure is very simple (NEER):
  • Navigate to the example directory.
  • Edit the configuration file da_solver.inp
  • Edit the experiment module file model_<Model Name>.py if needed.
  • Run the test case file <Model Name>_Pytest.py.

Inside each example directory you will find a configuration file da_solver.inp that you can edit. Here is a list of options currently used by DAPack:

  • filter_name : name of the data assimilation filter. This will be modified when smoothers take place in DAPack. As of now, this can be one of the following (non-case-sensitive):

    • EnKF : A stochastic (perturbed-observations) version of the ensemble Kalman Filter (EnKF).

    • SqrEnKF : A deterministic (square root) version of the ensemble Kalman Filter (EnKF).

    • BootStrap_PF: Bootstrap particle filter.

    • PF : Particle filter with resampling step. This also implements the sequential importance resampling (SIR).

    • HMC : This should be chosen if Hybrid Monte-Carlo sampler is to be used. This is used for each of the following cases:

      • Vanilla HMC with manual parameter tuning.
      • The No-UTurn-Sampler (NUTS).
      • Generalized NUTS for DA (Work-in-progress).

      The sampler will be determined based on the settings in the two variables Hamiltonian_integrator, mass_matrix_strategy.

  • particle_filter_resampling_strategy: The strategy used to re-sample states from the prior ensemble. Currently systematic , stratified are implemented:

  • ensemble_size: The number of ensemble members to keep during the assimilation process.

  • initial_time: Beginning of the timespan of the experiment.

  • final_time: Beginning of the timespan of the experiment.

  • cycle_length: length of the assimilation cycle. Measurements will be made at multiples of that interval. The three variables initial_time , final_time, cycle_length altogether define the filter timespan.

  • observation_operator_type: Type of the observation operator. For now, we have a linear,empirical. Both choices create a sparse matrix with ones corresponding to observed entries and zeros elsewhere. The operator function implemented here is very simple but more will be added. The design of the linear observation depends on the value of observed_variables_jump.

  • observation_noise_type: Only Gaussian observation errors are implemented.

  • observation_noise_level: Standard deviation of observation errors is calculated as the product of observation_noise_level and the average magnitude of the signal over the timespan of the assimilation experiment. This is calculated per prognostic variable.

  • observation_spacing_type: Either fixed, or random. If fixed is chosen, the observation time points are selected and fixed based on the variable observation_steps_per_filter_steps. If random is chosen, each time point in the filter timespan is chosen to be an actual observation point (assimilation point) based on a coin flip with probability of being picked set in the variable observation_chance_per_filter_steps

  • observation_steps_per_filter_steps: Observation frequency in time. 1 means observations are made at each time instance in the timespan, 2 means observations are made at every other time instance, and so on. Only used if observation_spacing_type == fixed

  • observation_chance_per_filter_steps: Probability of a time instance in the filter timespan to be picked as an observation point. Only used if observation_spacing_type == random

  • observed_variables_jump: This controls the observation frequency over the grid state points. 1 means all prognostic variables are observed at all grid points, 2 means all prognostic variables only at each other grid point are observed, and so on.

  • screen_output: Yes(True) or No(False). Controls whether to show numerical results on the screen or not.

  • screen_output_iter: The frequency of screen outputting. This is w.r.t simulation cycles. Used only if screen_output == Yes

  • file_output: Yes(True) or No(False). Controls whether to save numerical results to files on disk or not.

  • file_output_iter: The frequency of file outputting. This is w.r.t simulation cycles. Used only if file_output == Yes

  • file_output_means_only: Yes(True) or No(False). Controls whether to save (to files on disk) the ensemble means (Yes) or the whole ensembles (No).

  • decorrelate: Yes(True) or No(False). Create and apply a decorrelation operator to the background error covariance matrix. This is known as localization.

  • decorrelation_radius: localization distance/radius.

  • periodic_decorrelation: if periodic boundaries are used, this should be set to True.

  • read_decorrelation_from_file: Yes(True) or No(False). Check for ‘hdf5’ file named Decorr to read the decorrelation matrix from. An Exception will be thrown if the file is not in place.

  • background_errors_covariance_method: This creates a modeled version of the background error covariance matrix. Only two methods are currently implemented:

    • diagonal: This will result in uncorrelated structure.
    • diagonal: A full covariance matrix that may be decorrelated if requested by setting decorrelate=Yes.

    These options necessarily requires background errors to be Gaussian. More options will be considered. In both cases, a standard deviation vector is created calculated as the product of background_noise_level and the average magnitude of the signal over the timespan of the assimilation experiment. This is calculated per prognostic variable. Then either to set it as the diagonal of the background error covariance matrix if diagonal is chosen, set the dense (pre-localized) version of the background error covariance matrix to the outer product of this perturbation vector.

  • background_noise_level: Check previous point.

  • background_noise_type: Gaussian for now.

  • update_B_factor: The background error covariance matrix is updated as a linear combination of the modeled background error covariance matrix, and a flow-dependent (ensemble-based) version. This factor is multiplied by the flow-dependent version. 1 means flow-dependent version dominates, 0 means modeled version is used, and any other value (between 0 and 1) results in a hybrid version of the background error covariance matrix.

  • model_errors_covariance_method: This creates a model error covariance matrix. We assume model errors are Gaussian. This will be investigated further later. Construction strategy as in the background error covariance matrix, and uses uncertainty level set in the variable model_noise_level.

  • model_error_steps_per_model_steps: Time frequency of adding model errors. This is w.r.t model time step. Model time step is set in the configuration file solver.inp in the model directory and can be set in the setup function in the model class.

  • model_noise_type: Gaussian for now.

  • model_noise_level: check previous two points

  • use_sparse_packages: Yes(True) or No(False). Use sparse packages for matrix representation or not.

  • linear_system_solver: Either lu, splu will be used for solving linear systems if required ( to find the effect of \mathbf{B}^{-1} on a vector). Of course constructing the full inverse is avoided.

  • Hamiltonian_integrator: Symplectic integrator used to propagate the Hamiltonian system used in HMC. Available are verlet, 2stage, 3stage, 4stage

  • Hamiltonian_step_size: Size of the step size used in the Symplectic Hamiltonian integrator. Will be initial value only if NUTS is used.

  • Hamiltonian_number_of_steps: Number of steps taken by the symplectic integrator to between proposed points. This controls the length of the Hamiltonian trajectory. Will not be needed for NUTS.

  • Hamiltonian_burn_in_steps: Number of generated states (to reach convergence) before starting the sampling process.

  • Hamiltonian_mixing_steps: Number of states dropped between retained states after convergence. This is usually useful to reduce correlations and consequently increase independence of sampled states.

  • mass_matrix_strategy: The mass matrix is assumed to be diagonal. It can be multiple of the identity matrix, but better depend on the variances of the posterior. Available choices are identity, prior_variances, prior_precisions, modeled_variances, modeled_precisions. I recommend _precisions.

  • mass_matrix_scale_factor: This scales the diagonal of the mass matrix.

  • hamiltonian_sampling_strategy: HMC, or NUTS. The former directs the sampler to use fixed step size and number of steps in the symplectic integrator, while the latter activates generalized NUTS with dual averaging.

Add New Model

This is big thing that we will continue working on to make it much easier. Currently the easiest two options are:

  • Add the model to the HyPar package by editing the source code. In this case, you will need to create a directory under the models directory, and a corresponding one under the examples directory. Follow the pattern in other examples, you will find all are almost the same except for naming. Of course you can modify the model class file to override default functions given in the base classes..
  • Write the model class in details. We will add an example soon.

Add New Filter

You will need to edit two modules. Firstly, you need to add your assimilation cycle function and related functions to the module name _Assimilation_Filters. Secondly, you need to edit the function DA_filtering_cycle() inside HyPar_DAFiltering to add the option corresponding to the name you chose for your filter. If you will add statistical monitors to your filter, you may want to update the function DA_filtering_process() inside HyPar_DAFiltering by updating the file output section(s).