Sampling Methods#

Entries on this page:

Note

All sampling classes (samplers) inherit the Sampler base class (Sampler) and each sampler is associated with a configurations class derived from the Sampler configurations base class (SamplerConfigs)

Rejection Sampling#

class RejectionSamplerConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, proposal=None, constraint_test=None, bounds=None)[source]#

Bases: SamplerConfigs

Configurations for the rejection sampler RejectionSampler. In addition to the attributes provided by SamplerConfigs, this configurations class provides the following attributes:

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • size (int | None) – dimension of the target distribution to sample

  • log_density (Callable[[ndarray], float] | None) – log of the (unscaled) density function to be sampled

  • proposal (Proposal | None) – a proposal object to be used for generating new samples

  • constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.

  • bounds (Sequence | None) – bounds of the domain of the target random variable (if any)

  • random_seed (None | int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.

size: int | None#
log_density: Callable[[ndarray], float] | None#
proposal: Proposal | None#
constraint_test: Callable[[ndarray], bool] | None#
bounds: Sequence | None#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, proposal=None, constraint_test=None, bounds=None)#
class RejectionSampler(configs=None)[source]#

Bases: Sampler

Implementation of the rejection sampling algorithm with a predefined proposal.

Parameters:

configs (RejectionSamplerConfigs | dict | None) – a configurations object. See RejectionSamplerConfigs.

Note:
  • Validation of the configurations dictionary is taken care of in the super class

  • If a proposal is passed in the configurations, ‘constraint_test’ should be set to it. If a constraint test is passed both here and in the proposal, the one passed here will overwrite the one associated with the proposal

__init__(configs=None)[source]#

Initialize the random number generator

validate_configurations(configs, raise_for_invalid=True)[source]#

Check the passed configuratios and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used

Parameters:
  • configs (dict | RejectionSamplerConfigs) – full or partial (subset) configurations to be validated

  • raise_for_invalid (bool) – if True raise TypeError for invalid configrations type/key. Default True

Returns:

flag indicating whether passed configurations dictionary is valid or not

Raises:
Return type:

bool

update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid

Raises:

PyOEDConfigsValidationError – if invalid configurations passed

sample(sample_size=1, full_diagnostics=False)[source]#

Generate and return a sample of size sample_size. This method returns a list with each entry representing a sample point from the underlying distribution

Parameters:
  • sample_size (int)

  • initial_state

  • full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory

Returns:

a list of samples collected from the target distributions

start_batch_rejection_sampling(sample_size, full_diagnostics=False)[source]#

Start the the rejection sampling procedure. Unlike start_rejection_sampling, this function works by proposing a batch of samples (from the proposal) of size equal to the number of remaining samples to be collected; those who are accepted and satisfy the underlying constraint_test if one is set in the configurations.

Parameters:
  • sample_size (int) – number of smaple points to generate/collect from the predefined target distribution

  • full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory

Remarks:

This will replace start_rejection_sampling for efficiency; timing will decide!

start_rejection_sampling(sample_size, full_diagnostics=False)[source]#

Start the the rejection sampling procedure.

Parameters:
  • sample_size (int) – number of smaple points to generate/collect from the predefined target distribution

  • full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory

log_density(state)[source]#

Evaluate the value of the logarithm of the target unscaled posterior density function

diagnostic_statistics(proposals_repository, uniform_probabilities, acceptance_flags, collected_ensemble, plot_diagnostics=True, output_dir=None, plot_title='Rejection Sampling', filename_prefix='Rejection_Sampling')[source]#

Return diagnostic statistics of the sampler such as the rejection rate, acceptance ratio, etc.

update_random_number_generators(random_seed, update_proposal=True)[source]#

Reset/Update the underlying random_number generator by resetting it’s seed. The random number generator is provided by the RandomNumberGenerationMixin If update_proposal is True do the same for underlying proposal This actually replaces the current random number generator(s) with a new one(s) created from the given random_seed.

property size#

Return the dimentionsize of the underlying probability space

property proposal#

Get a handle of the proposal

property optimizer#
create_rejection_sampler(size, log_density, proposal=None, bounds=None, constraint_test=None, output_dir=None, random_seed=None)[source]#
Given the size of the target space, and a function to evalute log density,

create and return an RejectionSampler instance/object to generate samples using standard Rejection Sampling approach. Configurations/settings can be updated after inistantiation

Parameters:
  • size (int) – dimension of the target distribution to sample

  • log_density – a callable (function) to evaluate the logarithm of the target density (unscaled); this function takes one vector (1d array/iterable) of length equal to size, and returns a scalar

  • proposal (Proposal) – proposal instance (to generate candidate samples)

  • constraint_test – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.

  • random_seed – random seed used when the object is initiated to keep track of random samples This is useful for reproductivity. If None, random seed follows numpy.random.seed rules

Returns:

instance of RejctionSampler (with some or all configurations passed)

Markov Chain Monte Carlo Sampling#

class ChainDiagnostics(*, acceptance_rate, rejection_rate)[source]#

Bases: PyOEDData

A dataclass to hold the diagnostics of the Markov chain

Parameters:
  • acceptance_rate (float) – the rate of acceptance of the proposed samples

  • rejection_rate (float) – the rate of rejection of the proposed samples

acceptance_rate: float#
rejection_rate: float#
__init__(*, acceptance_rate, rejection_rate)#
class SamplingResults(*, chain_state_repository, collected_ensemble, proposals_repository, acceptance_flags, acceptance_probabilities, uniform_random_numbers, map_estimate, map_estimate_log_density, chain_diagnostics, chain_time)[source]#

Bases: PyOEDData

A dataclass to hold the results of the MCMC sampling

Parameters:
  • chain_state_repository (list[ndarray]) – a list of all generated states in the Markov chain

  • collected_ensemble (list[ndarray]) – a list of all collected samples from the Markov chain

  • proposals_repository (list[ndarray]) – a list of all proposed states in the Markov chain

  • acceptance_flags (list[int]) – a list of flags indicating whether the proposed states were accepted or not

  • acceptance_probabilities (list[float]) – a list of probabilities of accepting the proposed states

  • uniform_random_numbers (list[float]) – a list of uniform random numbers used in the acceptance step

  • map_estimate (ndarray) – the MAP estimate of the distribution

  • map_estimate_log_density (float) – the log-density of the MAP estimate

  • chain_diagnostics (ChainDiagnostics | None) – the diagnostics of the Markov chain

  • chain_time (float) – the time taken to generate the Markov chain

chain_state_repository: list[ndarray]#
collected_ensemble: list[ndarray]#
proposals_repository: list[ndarray]#
acceptance_flags: list[int]#
acceptance_probabilities: list[float]#
uniform_random_numbers: list[float]#
map_estimate: ndarray#
map_estimate_log_density: float#
chain_diagnostics: ChainDiagnostics | None#
chain_time: float#
__init__(*, chain_state_repository, collected_ensemble, proposals_repository, acceptance_flags, acceptance_probabilities, uniform_random_numbers, map_estimate, map_estimate_log_density, chain_diagnostics, chain_time)#
class MCMCSamplerConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, burn_in=500, mix_in=10, proposal=None, constraint_test=None)[source]#

Bases: SamplerConfigs

Configurations for the MCMC sampler MCMCSampler.

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • random_seed (None | int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.

  • size (int | None) – dimension of the target distribution to sample

  • log_density (Callable[[ndarray], float] | None) – log of the (unscaled) density function to be sampled

  • burn_in (int) – number of sample points to discard before collecting samples

  • mix_in (int) – number of generated samples between accepted ones (to descrease autocorrelation)

  • proposal (Proposal | None) – a proposal object to be used for generating new samples

  • constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.

size: int | None#
log_density: Callable[[ndarray], float] | None#
burn_in: int#
mix_in: int#
proposal: Proposal | None#
constraint_test: Callable[[ndarray], bool] | None#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, burn_in=500, mix_in=10, proposal=None, constraint_test=None)#
class MCMCSampler(configs=None)[source]#

Bases: Sampler

Basic class for MCMC sampling with a chosen (Default is a Gaussian) proposal

__init__(configs=None)[source]#

Implementation of the MCMC sampling algorithm with a predefined proposal

Parameters:

configs (dict | MCMCSamplerConfigs | None) – a configurations object. See MCMCSamplerConfigs.

validate_configurations(configs, raise_for_invalid=True)[source]#

A method to check the passed configuratios and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.

Parameters:
  • configs (dict) – a dictionary holding key/value configurations

  • raise_for_invalid (bool) – if True raise TypeError for invalid configrations type/key

Returns:

True/False flag indicating whether passed coinfigurations dictionary is valid or not

update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid

Raises:

TypeError is raised if any of the passed keys in kwargs is invalid/unrecognized

Remarks:

Generally, we don’t want actual implementations in abstract classes, however, this one is provided as good guidance. Derived classes can rewrite it.

generate_white_noise(size, truncate=False)[source]#

Generate a standard normal random vector of size size with values truncated at -/+3 if truncate is set to True

Returns:

a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.

Remarks:
  • this function returns a numpy array of size size even if size is set to 1

log_density(state)[source]#

Evaluate the value of the logarithm of the target unscaled posterior density function

sample(sample_size=1, initial_state=None, full_diagnostics=False)[source]#

Generate and return a sample of size sample_size. This method returns a list with each entry representing a sample point from the underlying distribution

Parameters:
  • sample_size (int)

  • initial_state

  • full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory

  • verbose (bool)

map_estimate(sample_size=100, initial_state=None, full_diagnostics=False)[source]#

Search for a MAP (maximum aposteriori) estimate by sampling (space exploration) This method returns a single-point estimate of the MAP of the distribution.

Parameters:
  • sample_size

  • initial_state

  • full_diagnostics

start_MCMC_sampling(sample_size, initial_state=None, full_diagnostics=False)[source]#

Start the HMC sampling procedure with initial state as passed. Use the underlying configurations for configuring the Hamiltonian trajectory, burn-in and mixin settings.

Parameters:
  • sample_size (int) – number of smaple points to generate/collect from the predefined target distribution

  • initial_state – initial point of the chain (any point that falls in the target distribution or near by it will result in faster convergence). You can try prior mean if this is used in a Bayesian approach

  • randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration

  • full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory

mcmc_chain_diagnostic_statistics(proposals_repository, chain_state_repository, uniform_probabilities, acceptance_probabilities, collected_ensemble, map_estimate=None, acceptance_flags=None, output_dir=None, plot_title='MCMC', filename_prefix='MCMC_Sampling')[source]#

Return diagnostic statistics of the chain such as the rejection rate, acceptance ratio, etc.

update_random_number_generators(random_seed, update_proposal=True)[source]#

Reset/Update the underlying random_number generator by resetting it’s seed. The random number generator is provided by the RandomNumberGenerationMixin If update_proposal is True do the same for underlying proposal This actually replaces the current random number generator(s) with a new one(s) created from the given random_seed.

property size#

Return the dimentionsize of the underlying probability space

property mix_in#

Return the number of generated samples between accepted ones

property burn_in#

Return the number of sample points to discard before collecting samples

property constraint_test: Callable[[ndarray], bool]#

Return the function that returns a boolean value True if sample point satisfy any constraint, False otherwise.

property proposal#

Get a handle of the proposal

class VALID_INTEGRATORS(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: StrEnum

Valid symplectic integrators for the HMC sampler

LEAPFROG = '\\Aleapfrog\\Z'#
VERLET = '\\Averlet\\Z'#
TWO_STAGE = '\\A2(-|_| )*stage(s)*\\Z'#
THREE_STAGE = '\\A3(-|_| )*stage(s)*\\Z'#
class MassMatrix(*, matrix, inverse, sqrt)[source]#

Bases: PyOEDData

Data container for a covariance matrix of a Gaussian proposal.

matrix: ndarray | spmatrix#
inverse: ndarray | spmatrix#
sqrt: ndarray | spmatrix#
__init__(*, matrix, inverse, sqrt)#
class HMCSamplerConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, log_density_grad=None, burn_in=500, mix_in=10, symplectic_integrator='verlet', symplectic_integrator_stepsize=0.01, symplectic_integrator_num_steps=20, mass_matrix=1, sparse=True, constraint_test=None)[source]#

Bases: SamplerConfigs

Configurations for the HMC sampler HMCSampler.

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • random_seed (None | int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.

  • size (int | None) – dimension of the target distribution to sample

  • log_density (Callable[[ndarray], float] | None) – log of the (unscaled) density function to be sampled

  • log_density_grad (Callable[[ndarray], ndarray] | None) – the gradient of the log_density function passed. If None, it will be approximated using finite differences.

  • burn_in (int) – number of sample points to discard before collecting samples

  • mix_in (int) – number of generated samples between accepted ones (to descrease autocorrelation)

  • symplectic_integrator (str) – name of the symplectic integrator to use; acceptable are ‘leapfrog’, ‘2-stage’, ‘3-stage’, where both ‘leapfrog’ and ‘verlet’ are equivalent.

  • symplectic_integrator_stepsize (float) – the step size of the symplectic integrator

  • symplectic_integrator_num_steps (int) – number of steps of size symplectic_integrator_stesize

  • mass_matrix (float | None | ndarray | spmatrix) – mass matrix to be used to adjust sampling the auxilliary Gaussian momentum

  • constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.

size: int | None#
log_density: Callable[[ndarray], float] | None#
log_density_grad: Callable[[ndarray], ndarray] | None#
burn_in: int#
mix_in: int#
symplectic_integrator: str#
symplectic_integrator_stepsize: float#
symplectic_integrator_num_steps: int#
mass_matrix: float | None | ndarray | spmatrix#
sparse: bool#
constraint_test: Callable[[ndarray], bool] | None#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, log_density_grad=None, burn_in=500, mix_in=10, symplectic_integrator='verlet', symplectic_integrator_stepsize=0.01, symplectic_integrator_num_steps=20, mass_matrix=1, sparse=True, constraint_test=None)#
class HMCSampler(configs=None)[source]#

Bases: Sampler, RandomNumberGenerationMixin

__init__(configs=None)[source]#

Implementation of the HMC sampling algorithm (with multiple choices of the symplectic integrator).

Parameters:

configs (dict | HMCSamplerConfigs | None) – A configurations object. See HMCSamplerConfigs.

standardize_configurations(configs)[source]#
validate_configurations(configs, raise_for_invalid=True)[source]#

A method to check the passed configuratios and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used

Parameters:
  • configs (dict | SamplerConfigs) – a dictionary holding key/value configurations

  • raise_for_invalid (bool) – if True raise TypeError for invalid configrations type/key

Returns:

True/False flag indicating whether passed coinfigurations dictionary is valid or not

Raises:

see the parameter raise_for_invalid

Return type:

bool

update_configurations(**kwargs)[source]#

Update the configurations of the sampler object with the passed key-value pairs.

update_mass_matrix(mass_matrix)[source]#

Update the mass matrix (standardize, update configurations, and update locally maintained MassMatrix instance.)

sample(sample_size=1, initial_state=None, full_diagnostics=False)[source]#

Generate and return a sample of size sample_size. This method returns a list with each entry representing a sample point from the underlying distribution

Parameters:
  • sample_size (int)

  • initial_state

  • full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory

map_estimate(sample_size=100, initial_state=None, full_diagnostics=False)[source]#

Search for a MAP (maximum aposteriori) estimate by sampling (space exploration) This method returns a single-point estimate of the MAP of the distribution

Parameters:
  • sample_size (int)

  • initial_state

  • verbose (bool)

start_MCMC_sampling(sample_size, initial_state=None, randomize_step_size=False, full_diagnostics=False)[source]#

Start the HMC sampling procedure with initial state as passed. Use the underlying configurations for configuring the Hamiltonian trajectory, burn-in and mixin settings.

Parameters:
  • sample_size (int) – number of smaple points to generate/collect from the predefined target distribution

  • initial_state – initial point of the chain (any point that falls in the target distribution or near by it will result in faster convergence). You can try prior mean if this is used in a Bayesian approach

  • randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration

  • full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory

generate_white_noise(size, truncate=False)[source]#
Generate a standard normal random vector of size size with values truncated

at -/+3 if truncate is set to True

Returns:

a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.

Remarks:
  • this function returns a numpy array of size size even if size is set to 1

mass_matrix_matvec(momentum)[source]#

Multiply the mass matrix (in the configurations) by the passed momentum

mass_matrix_inv_matvec(momentum)[source]#

Multiply the inverse of the mass matrix (in the configurations) by the passed momentum

mass_matrix_sqrt_matvec(momentum)[source]#

Multiply the Square root (Lower Cholesky factor) of the mass matrix (in the configurations) by the passed momentum

log_density(state)[source]#

Evaluate the value of the logarithm of the target unscaled posterior density function

log_density_grad(state)[source]#

Evaluate the gradient of the logarithm of the target unscaled posterior density function

potential_energy(state)[source]#

Evaluate the value of the potential energy at the given state. The potential energy is the negative value of the logarithm of the unscaled posterior density function.

potential_energy_grad(state)[source]#

Evaluate the gradient of the potential energy at the given state. The potential energy is the negative value of the logarithm of the unscaled posterior density function

kinetic_energy(momentum)[source]#

Evaluate the Kinetic energy of the posterior; this is independent from the state and is evaluated as the weighted l2 norm of the momentum (scaled by the inverse of hte mass matrix); This is half of the squared Mahalanobis distance of the Gaussian momentum.

total_Hamiltonian(momentum, state)[source]#

Evaluate the value of the total energy function: Hamiltonian = kinetic energy + potential energy

build_Hamiltonian_trajectory(momentum, state, step_size, num_steps, randomize_step_size=False)[source]#

Given the current momentum and state pair of the Hamiltonian system, generate a trajectory of (momentum, state).

Parameters:
  • momentum

  • state

  • randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration

apply_symplectic_integration(momentum, state, step_size, num_steps, randomize_step_size=False, symplectic_integrator='3-stage')[source]#

Apply one full step of size step_size of the symplectic integrator to the Hamiltonian system

Parm momentum:

Parameters:
  • state

  • num_steps (int)

  • step_size (float)

  • symplectic_integrator (str) – name of the symplectic integrator to use; acceptable are: ‘verlet’, ‘leapfrog’, ‘2-stage’, ‘3-stage’, where both ‘leapfrog’ and ‘verlet’ are equivalent

  • randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration

Returns:

a tuple (p, s) where p and s are the integrated (forward in time) momentum and state respectively

mcmc_chain_diagnostic_statistics(proposals_repository, chain_state_repository, uniform_probabilities, acceptance_probabilities, collected_ensemble, map_estimate=None, acceptance_flags=None, output_dir=None, plot_title='MCMC', filename_prefix='MCMC_Sampling')[source]#

Return diagnostic statistics of the chain such as the rejection rate, acceptance ratio, etc.

property size#

Return the dimentionsize of the underlying probability space

property mix_in#

Return the number of generated samples between accepted ones

property burn_in#

Return the number of sample points to discard before collecting samples

property symplectic_integrator#

Return the name of the symplectic integrator used

property symplectic_integrator_stepsize#

Return the step size of the symplectic integrator

property symplectic_integrator_num_steps#

Return the number of steps of the symplectic integrator

property mass_matrix: MassMatrix#

Return the constructed mass matrix object

property constraint_test: Callable[[ndarray], bool]#

Return the function that returns a boolean value True if sample point satisfy any constraint, False otherwise.

create_mcmc_sampler(size, log_density, burn_in=100, mix_in=10, constraint_test=None, proposal_variance=0.1, output_dir=None, random_seed=None)[source]#
Given the size of the target space, and a function to evalute log density,

create and return an MCMCSampler instance/object to generate samples using standard MCMC sampling approach. Configurations/settings can be updated after inistantiation

Parameters:
  • size (int) – dimension of the target distribution to sample

  • log_density – a callable (function) to evaluate the logarithm of the target density (unscaled); this function takes one vector (1d array/iterable) of length equal to size, and returns a scalar

  • burn_in (int) – number of steps to drop before collecting samples in the chain (for convergence)

  • mix_in (int) – number of steps to drop between collected samples (reduce autocorrelation)

  • proposal_variance – variance of the Gaussian proposal

  • constraint_test – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.

  • random_seed – random seed used when the object is initiated to keep track of random samples This is useful for reproductivity. If None, random seed follows numpy.random.seed rules

Returns:

instance of MCMCSampler (with some or all configurations passed)

create_hmc_sampler(size, log_density, log_density_grad=None, burn_in=100, mix_in=5, symplectic_integrator='verlet', symplectic_integrator_stepsize=0.01, symplectic_integrator_num_steps=10, mass_matrix=0.1, constraint_test=None, random_seed=None, output_dir=None)[source]#
Given the size of the target space, and a function to evalute log density,

create and return an HMCSampler instance/object to generate samples using HMC sampling approach. Configurations/settings can be updated after inistantiation

Parameters:
  • size (int) – dimension of the target distribution to sample

  • log_density – a callable (function) to evaluate the logarithm of the target density (unscaled); this function takes one vector (1d array/iterable) of length equal to size, and returns a scalar

  • log_density_grad – a callable (function) to evaluate the gradient of log_density; this function takes one vector (1d array/iterable) of length equal to size, and returns a vector of the same length. If None is passed, the gradient is automatically evaluated using finite differences (FD).

  • burn_in (int) – number of steps to drop before collecting samples in the chain (for convergence)

  • mix_in (int) – number of steps to drop between collected samples (reduce autocorrelation)

  • symplectic_integrator_stepsize (float) – (positive scalar) the step size of the symplectic integrator

  • symplectic_integrator_num_steps (int) – (postive integer) number of steps of size symplectic_integrator_stesize taken before returnig the next proposed point over the Hamiltonian trajectory

  • mass_matrix – (nonzero scalar or SPD array of size size x size) mass matrix to be used to adjust sampling the auxilliary Gaussian momentum

  • constraint_test – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.

  • random_seed – random seed used when the object is initiated to keep track of random samples This is useful for reproductivity. If None, random seed follows numpy.random.seed rules

Returns:

instance of HMCSampler (with some or all configurations passed)

banana_potential_energy_value(state, a=2.15, b=0.75, rho=0.9)[source]#

Potential energy of the posterir. This is dependent on the target state, not the momentum. It is the negative of the posterior-log, and MUST be implemented for each distribution

banana_potential_energy_gradient(state, a=2.15, b=0.75, rho=0.9)[source]#

Gradient of the Potential energy of the posterir.

banana_log_density(state, a=2.15, b=0.75, rho=0.9)[source]#

The logarithm of the banana distribution PDF

banana_log_density_gradient(state, a=2.15, b=0.75, rho=0.9)[source]#

The gradient of the logarithm of the banana distribution PDF

sample_banana_distribution(sample_size, verbose=False, method='hmc', constraint_test=None)[source]#

Simplified example to sample a Banana-shaped probability distributions

Proposals#

class GaussianProposalConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Gaussian', random_seed=None, size=None, mean=None, variance=1.0, constraint_test=None, constraint_max_trials=100)[source]#

Bases: ProposalConfigs

Configurations for the GaussianProposal class.

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • name (str) – name of the distribution

  • random_seed (int | None) – random seed used for pseudo random number generation

  • size (int | None) – dimension of the target distribution to sample.

  • mean (float | ndarray | None) – mean of the proposal. If None (default), the mean is set to the current state passed to the sample or __call__ method.

  • variance (float | ndarray | spmatrix) – variance of the proposal. If None, the variance is set to 1.0.

  • constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any desired constraints, and False otherwise; ignored if None, is passed.

  • constraint_max_trials (int) – the maximum number of trials to satisfy constraints (e.g., bounds) for each sample point. Default is 100.

size: int | None#
mean: float | ndarray | None#
variance: float | ndarray | spmatrix#
constraint_test: Callable[[ndarray], bool] | None#
constraint_max_trials: int#
sparse = True#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Gaussian', random_seed=None, size=None, mean=None, variance=1.0, constraint_test=None, constraint_max_trials=100)#
class GaussianCovarianceMatrix(*, matrix, inverse, sqrt)[source]#

Bases: PyOEDData

Data container for a covariance matrix of a Gaussian proposal.

matrix: ndarray | spmatrix#
inverse: ndarray | spmatrix#
sqrt: ndarray | spmatrix#
__init__(*, matrix, inverse, sqrt)#
class GaussianProposal(configs=None)[source]#

Bases: Proposal, RandomNumberGenerationMixin

A class implementing a Gaussian proposal for MCMC sampling.

Parameters:

configs (dict | GaussianProposalConfigs | None) – Configurations object, see GaussianProposalConfigs

__init__(configs=None)[source]#

Initialize the random number generator

validate_configurations(configs, raise_for_invalid=True)[source]#

A method to check the passed configurations and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.

Parameters:
  • configs – an object holding configurations

  • raise_for_invalid (bool) – If True, raise an exception if any of the passed configurations is invalid.

Returns:

True/False flag indicating whether passed configurations dictionary is valid or not

generate_white_noise(size, truncate=False)[source]#
Generate a standard normal random vector of size size with values truncated at

-/+3 if truncate is set to True

Parameters:
  • size (int) – dimension of the probability space to sample

  • truncate (bool) – if True, truncate the samples at -/+3, that is any sample point/entry above 3 is set to 3, and any value below -3 is set to -3.

Returns:

a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.

Remarks:
  • this function returns a numpy array of size size even if size is set to 1

Return type:

ndarray

covariance_matrix_matvec(state)[source]#

Multiply the mass matrix (in the configurations) by the passed state/vector

covariance_matrix_inv_matvec(state)[source]#

Multiply the inverse of the mass matrix (in the configurations) by the passed state/vector

covariance_matrix_sqrt_matvec(state)[source]#

Multiply the Square root (Lower Cholesky factor) of the mass matrix (in the configurations) by the passed state/vector

sample(sample_size=1, initial_state=None)[source]#

Generate and return a sample of size sample_size from a Gaussian distribution centered around the passed initial_state. If the initial_state is not passed, the preconfigured mean is used. Thus, the proposal mean must be set in configurations, otherwise a TypeError is raised

Returns:

1d (if internal dimension/size is 1) or 2d numpy array (if internal dimension/size is more than 1) holding samples. Each entry sample[i] (row element) retrieves a sample point from the underlying probability distribution.

Return type:

ndarray

pdf(state, normalize=False, log=False, mean=None)[source]#

Evaluate the value of the density function (normalized or upto a fixed scaling constant) at the passed state/vector.

If the mean is not passed, the preconfigured mean is used, thus, the proposal mean must be set in configurations, otherwise a TypeError is raised

Parameters:
  • state – What to evaluate the density function at.

  • normalize (bool) – scale the PDF by the normalization factor

  • log (bool) – if True, evaluate the gradient of the logarithm of the PDF

Returns:

the value of the density function evaluated at the passed state/vector.

pdf_gradient(state, normalize=False, log=False, mean=None)[source]#

Evaluate the gradient of the density function at the passed state/vector.

If the mean is not passed, the preconfigured mean is used, thus, the proposal mean must be set in configurations, otherwise an error raised

Parameters:
  • state

  • normalize (bool) – scale the PDF by the normalization factor

  • log (bool) – if True, evaluate the gradient of the logarithm of the PDF

Returns:

the value of the density function evaluated at the passed state/vector.

log_density(state, normalize=False, mean=None)[source]#

Evaluate the logarithm of the density function at the passed state, where the distrubution mean is adjusted to mean. If mean is None the configured mean is used. This parameter is passed to satisify the needs for the case where the proposal mean changes frequently, e.g., MCMC.

Parameters:
  • state

  • mean

  • normalize (bool) – scale the PDF by the normalization factor

  • normalize

Returns:

the logarithm of the density function evaluated at the passed state.

log_density_gradient(state, normalize=False, mean=None)[source]#

Evaluate the gradient of the logarithm of the density function at the passed state, where the distrubution mean is adjusted to mean. If mean is None the configured mean is used. This parameter is passed to satisify the needs for the case where the proposal mean changes frequently, e.g., MCMC.

Parameters:
  • state – the state to evaluate the gradient at

  • normalize – scale the PDF by the normalization factor

  • mean – the mean of the proposal distribution. If None, the configured mean is used.

Returns:

the gradient of the logarithm of the density function evaluated at the passed state.

update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid.

property size#

Return the dimentionsize of the underlying probability space

property mean#

Return the mean of the Gaussian proposal

property covariance: GaussianCovarianceMatrix#

Return the covariance matrix of the Gaussian proposal

property sparse#

Return the sparsity configuration

property constraint_test#

Return the constraint test function

property constraint_max_trials#

Return the maximum number of trials to satisfy constraints

class GaussianMixtureModelProposalConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GMM Proposal', random_seed=123, size=None, num_components=None, weights=None, means=None, variances=None, constraint_test=None, constraint_max_trials=100, sparse=True)[source]#

Bases: ProposalConfigs

Configurations for the GaussianMixtureModelProposal class.

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • name (str) – name of the distribution

  • random_seed (int) – random seed used for pseudo random number generation

  • size (int | None) – dimension of the target distribution to sample.

  • num_components (int | None) – the number of components in the mixture. This is mandatory.

  • weights (list[float] | None) – a list of weights of each of the GMM component. These must sum up to 1. This is mandatory.

  • means (list[float | ndarray] | None) – a list of means of each of the components of the GMM model. If None, all components will be centered around the current state passed to the sample or __call__ method.

  • variances (list[float | ndarray | spmatrix] | None) – a list of variances of each of the components of the GMM model. If None, all components will have a variance of 1.0.

size: int | None#
num_components: int | None#
weights: list[float] | None#
means: list[float | ndarray] | None#
variances: list[float | ndarray | spmatrix] | None#
constraint_test: Callable[[ndarray], bool] | None#
constraint_max_trials: int#
sparse: bool#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GMM Proposal', random_seed=123, size=None, num_components=None, weights=None, means=None, variances=None, constraint_test=None, constraint_max_trials=100, sparse=True)#
class GaussianMixtureModelProposal(configs=None)[source]#

Bases: Proposal, RandomNumberGenerationMixin

A class implementing a Gaussian Mixture Model (GMM) proposal for MCMC sampling.

Parameters:

configs (dict | GaussianMixtureModelProposalConfigs | None) – A configurations object, see GaussianMixtureModelProposalConfigs

__init__(configs=None)[source]#

Initialize the random number generator

update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid.

validate_configurations(configs, raise_for_invalid=True)[source]#

A method to check the passed configurations and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.

Parameters:
  • configs (dict | GaussianMixtureModelProposalConfigs) – an object holding configurations

  • raise_for_invalid (bool) – If True, raise an exception if any of the passed configurations is invalid.

Returns:

True/False flag indicating whether passed configurations dictionary is valid or not

Return type:

bool

generate_white_noise(size, truncate=False)[source]#
Generate a standard normal random vector of size size with values truncated

at -/+3 if truncate is set to True

Parameters:
  • size (int) – dimension of the probability space to sample

  • truncate (bool) – if True, truncate the samples at -/+3, that is any sample point/entry above 3 is set to 3, and any value below -3 is set to -3.

Returns:

a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.

Remarks:
  • this function returns a numpy array of size size even if size is set to 1

covariance_matrix_matvec(state, component)[source]#

Multiply the covariance matrix of the choosen component by the passed state/vector

Parameters:
  • state – the state (random variable) to multiple covariance by

  • component (int) – the index of the component to multiply by.

covariance_matrix_inv_matvec(state, component)[source]#

Multiply the inverse of the covariance matrix of the choosen component by the passed state/vector

Parameters:
  • state – the state (random variable) to multiple covariance by

  • component (int) – the index of the component to multiply by.

covariance_matrix_sqrt_matvec(state, component)[source]#

Multiply the square root (lower Cholesky factor) of the covariance matrix of the choosen component by the passed state/vector

Parameters:
  • state – the state (random variable) to multiple covariance by

  • component (int) – the index of the component to multiply by.

sample_component(component, sample_size=1, initial_state=None)[source]#

Generate and return a sample of size sample_size from a Gaussian distribution centered around the passed initial_state. If the initial_state is not passed, the preconfigured mean of the corresponding component is used, thus, the proposal mean must be set in configurations, otherwise an error is raised

Parameters:
  • component (int) – the index of the component to multiply by.

  • sample_size (int) – the number of samples to generate

  • initial_state – the state to propose around. If None, the preconfigured mean of the component is used.

Returns:

1d (if internal dimension/size is 1) or 2d numpy array (if internal dimension/size is more than 1) holding samples. Each entry sample[i] (row element) retrieves a sample point from the underlying probability distribution.

Return type:

ndarray

sample(sample_size=1, initial_state=None)[source]#

Generate GMM samples

Parameters:
  • sample_size (int) – the number of samples to generate from the GMM model

  • initial_state – the state to propose around. If None, the preconfigured mean of the component is used.

Returns:

a numpy array of shape (sample_size, size) sampled from the GMM model

Return type:

ndarray

pdf(state, normalize=False, log=False, mean=None)[source]#

Evaluate the value of the density function (normalized or upto a fixed scaling constant) at the passed state/vector.

If a value of the mean is passed, all components’s means are shifted to this state. This gives flexibility in sampling using methods such as MCMC

Parameters:
  • state

  • normalize (bool) – scale the PDF by the normalization factor

  • log (bool) – if True, evaluate the gradient of the logarithm of the PDF

Returns:

the value of the density function evaluated at the passed state/vector.

log_density(state, normalize=False, mean=None)[source]#

Evaluate the logarithm of the density function at the passed state.

If a value of the mean is passed, all components’s means are shifted to this state. This gives flexibility in sampling using methods such as MCMC

Parameters:
  • state

  • component (int) – the index of the component to multiply by. The components are enumerated 0, 1, …

  • normalize (bool) – scale the PDF by the normalization factor

Returns:

the logarithm of the density function evaluated at the passed state.

pdf_component(state, component, normalize=False, mean=None)[source]#

Evaluate the value of the density function of one component (a Gaussian) of the GMM model.

If a value of the mean is passed, the coponent’s means is shifted to this, otherwise, the one in the configurations dictionary is used. This gives flexibility in sampling using methods such as MCMC

Parameters:
  • state

  • component (int) – the index of the component to multiply by. The components are enumerated 0, 1, …

  • normalize (bool) – scale the PDF by the normalization factor

Returns:

the logarithm of the density function evaluated at the passed state.

log_density_component(state, component, normalize=False, mean=None)[source]#

Evaluate the logarithm of the value of the density function of one component (a Gaussian) of the GMM model.

If a value of the mean is passed, the coponent’s means is shifted to this, otherwise, the one in the configurations dictionary is used. This gives flexibility in sampling using methods such as MCMC

Parameters:
  • state

  • component (int) – the index of the component to multiply by. The components are enumerated 0, 1, …

  • normalize (bool) – scale the PDF by the normalization factor

Returns:

the logarithm of the density function evaluated at the passed state.

property size#

Return the dimentionsize of the underlying probability space

property num_components#

Return the number of components in the GMM model

property weights#

Return the weights of the GMM model

property means#

Return the means of the GMM model

property sparse#

Return the sparsity configuration

property constraint_test#

Return the constraint test function

property constraint_max_trials#

Return the maximum number of trials to satisfy constraints

property covariances: list[GaussianCovarianceMatrix]#

Return the covariance matrices of the GMM model

class ExponentialProposalConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Exponential', random_seed=123, size=None, scale=None, constraint_test=None, constraint_max_trials=100)[source]#

Bases: ProposalConfigs

Configurations for the Exponential proposal class.

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • name (str) – name of the distribution

  • size (int | None) – dimension of the target distribution to sample.

  • scale (float | ndarray | None) – scale parameter of the proposal. This may be a scalar or a vector of size size.

  • constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point

  • constraint_max_trials (int) – the maximum number of trials to satisfy constraints

  • random_seed (int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.

size: int | None#
scale: float | ndarray | None#
constraint_test: Callable[[ndarray], bool] | None#
constraint_max_trials: int#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Exponential', random_seed=123, size=None, scale=None, constraint_test=None, constraint_max_trials=100)#
class ExponentialProposal(configs=None)[source]#

Bases: Proposal, RandomNumberGenerationMixin

A class implementing an Exponential proposal for MCMC sampling. The distribution assumes independent components of the random vector.

__init__(configs=None)[source]#

Initialize the Exponential proposal object.

Parameters:

configs (dict | ExponentialProposalConfigs | None) – A configurations object, see ExponentialConfigs

validate_configurations(configs, raise_for_invalid=True)[source]#

A method to check the passed configurations and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.

Parameters:
  • configs (dict | ExponentialProposalConfigs) – an object holding configurations

  • raise_for_invalid (bool) – If True, raise an exception if any of the passed configurations is invalid.

Returns:

True/False flag indicating whether passed configurations dictionary is valid or not

Return type:

bool

sample(sample_size=1, initial_state=None)[source]#

Generate and return a sample of size sample_size from an iid Exponential distribution with predefine scale parameter. The initial_state is not used here; it is added to unify the sample() interface across proposals.

Returns:

1d (if internal dimension/size is 1) or 2d numpy array (if internal dimension/size is more than 1) holding samples. Each entry sample[i] (row element) retrieves a sample point from the underlying probability distribution.

pdf(state, normalize=False, log=False)[source]#

Evaluate the value of the density function (normalized or upto a fixed scaling constant) at the passed state.

Parameters:
  • state

  • normalize (bool) – scale the PDF by the normalization factor

  • log (bool) – if True, evaluate the gradient of the logarithm of the PDF

Returns:

the value of the density function evaluated at the passed state.

pdf_gradient(state, normalize=False, log=False)[source]#

Evaluate the gradient of the density function at the passed state/vector.

Parameters:
  • state

  • normalize (bool) – scale the PDF by the normalization factor

  • log (bool) – if True, evaluate the gradient of the logarithm of the PDF

Returns:

the value of the density function evaluated at the passed state/vector.

log_density(state, normalize=False)[source]#

Evaluate the logarithm of the density function at the passed state.

Parameters:
  • state

  • normalize (bool)

Returns:

the logarithm of the density function evaluated at the passed state.

log_density_gradient(state)[source]#

Evaluate the gradient of the logarithm of the density function at the passed state.

Parameters:
  • state

  • normalize (bool)

Returns:

the gradient of the logarithm of the density function evaluated at the passed state.

update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid

property size#

Return the dimentionsize of the underlying probability space

property scale#

Return the scale parameter of the exponential proposal

property constraint_test#

Return the constraint test function

property constraint_max_trials#

Return the maximum number of trials to satisfy constraints

visualize_proposal(proposal, sample_size=1000, xlim=None, ylim=None, title=None, grid_size=200, labels=None, linewidth=1.0, markersize=2, fontsize=18, fontweight='bold', keep_plots=False, output_dir=None, filename_prefix=None, verbose=False)[source]#
Parameters:
  • proposal (Proposal)

  • sample_size (int) – size of the sample to generate for scatter plotting

  • xlim – x-limit of plots (only used for 2D case)

  • ylim – y-limit of plots (only used for 2D case)

  • title – title to be added to created plot(s)

  • grid_size – number of discretization grid points to create 2D mesh for plots (only used for 2D case)

  • labels – iterable of labels to be used for the variables on plots (X_i auto used if not passed)

  • linewidth – width of lines (e.g., for contour plots)

  • markersize – markers in scatter plots

  • fontsize – general font size on all plots

  • fontweight – font weighti

  • keep_plots – if False all plots will be closed by calling matplotlib.pyplot.close('all')()

  • output_dir – location to save files

  • filename_prefix – if None, a prefix with proposal name will be chosen, e.g., “Exponential_Proposal_”

  • verbose (bool) – screen verbosity

Remarks:

Set keep_plots to True, for example, if you are doing interactive plotting or using a notebook to show plots.

create_Gaussian_proposal(size, mean=None, variance=1, random_seed=None, output_dir=None, constraint_test=None)[source]#

Given the size of the target space, create and return an GaussianProposal instance/object to generate samples using Gaussian proposal centered around current state (or predefined mean) Configurations/settings can be updated after inistantiation

This function shows how to create GaussianProposal instances (with some or all configurations passed)

create_GaussianMixtureModel_proposal(size=2, num_components=3, weights=[0.5, 0.25, 0.25], means=[(0, 0), (0, 0.25), (0.25, 0)], variances=[0.04, 0.04, 0.04], random_seed=None, constraint_test=<function <lambda>>)[source]#

Given the size of the target space, create and return an GaussianMixtureModelProposal instance/object to generate samples using GMM proposal centered around current state (or predefined mean) Configurations/settings can be updated after inistantiation

This function shows how to create GaussianProposal instances (with some or all configurations passed)

create_Exponential_proposal(size, scale=1, random_seed=None, constraint_test=None, output_dir=None)[source]#

Given the size of the target space, and the scale parameter, create and return an ExponentialProposal instance/object to generate samples using and iid Exponential proposal Configurations/settings can be updated after inistantiation

This function shows how to create ExponentialProposal instances

(with some or all configurations passed)