Sampling Methods#
Entries on this page:
Note
All sampling classes (samplers) inherit the
Sampler base class (Sampler)
and each sampler is associated with a configurations class derived from
the Sampler configurations base class (SamplerConfigs)
Rejection Sampling#
- class RejectionSamplerConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, proposal=None, constraint_test=None, bounds=None)[source]#
Bases:
SamplerConfigs
Configurations for the rejection sampler
RejectionSampler
. In addition to the attributes provided bySamplerConfigs
, this configurations class provides the following attributes:- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
size (int | None) – dimension of the target distribution to sample
log_density (Callable[[ndarray], float] | None) – log of the (unscaled) density function to be sampled
proposal (Proposal | None) – a proposal object to be used for generating new samples
constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.
bounds (Sequence | None) – bounds of the domain of the target random variable (if any)
random_seed (None | int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.
- size: int | None#
- log_density: Callable[[ndarray], float] | None#
- constraint_test: Callable[[ndarray], bool] | None#
- bounds: Sequence | None#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, proposal=None, constraint_test=None, bounds=None)#
- class RejectionSampler(configs=None)[source]#
Bases:
Sampler
Implementation of the rejection sampling algorithm with a predefined proposal.
- Parameters:
configs (RejectionSamplerConfigs | dict | None) – a configurations object. See
RejectionSamplerConfigs
.- Note:
Validation of the configurations dictionary is taken care of in the super class
If a proposal is passed in the configurations, ‘constraint_test’ should be set to it. If a constraint test is passed both here and in the proposal, the one passed here will overwrite the one associated with the proposal
- validate_configurations(configs, raise_for_invalid=True)[source]#
Check the passed configuratios and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used
- Parameters:
configs (dict | RejectionSamplerConfigs) – full or partial (subset) configurations to be validated
raise_for_invalid (bool) – if True raise
TypeError
for invalid configrations type/key. Default True
- Returns:
flag indicating whether passed configurations dictionary is valid or not
- Raises:
AttributeError – if any (or a group) of the configurations does not exist in the model configurations
RejectionSamplerConfigs
.PyOEDConfigsValidationError – if the configurations are invalid and raise_for_invalid is set to True.
- Return type:
bool
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid
- Raises:
PyOEDConfigsValidationError – if invalid configurations passed
- sample(sample_size=1, full_diagnostics=False)[source]#
Generate and return a sample of size sample_size. This method returns a list with each entry representing a sample point from the underlying distribution
- Parameters:
sample_size (int)
initial_state
full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory
- Returns:
a list of samples collected from the target distributions
- start_batch_rejection_sampling(sample_size, full_diagnostics=False)[source]#
Start the the rejection sampling procedure. Unlike start_rejection_sampling, this function works by proposing a batch of samples (from the proposal) of size equal to the number of remaining samples to be collected; those who are accepted and satisfy the underlying constraint_test if one is set in the configurations.
- Parameters:
sample_size (int) – number of smaple points to generate/collect from the predefined target distribution
full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory
- Remarks:
This will replace start_rejection_sampling for efficiency; timing will decide!
- start_rejection_sampling(sample_size, full_diagnostics=False)[source]#
Start the the rejection sampling procedure.
- Parameters:
sample_size (int) – number of smaple points to generate/collect from the predefined target distribution
full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory
- log_density(state)[source]#
Evaluate the value of the logarithm of the target unscaled posterior density function
- diagnostic_statistics(proposals_repository, uniform_probabilities, acceptance_flags, collected_ensemble, plot_diagnostics=True, output_dir=None, plot_title='Rejection Sampling', filename_prefix='Rejection_Sampling')[source]#
Return diagnostic statistics of the sampler such as the rejection rate, acceptance ratio, etc.
- update_random_number_generators(random_seed, update_proposal=True)[source]#
Reset/Update the underlying random_number generator by resetting it’s seed. The random number generator is provided by the RandomNumberGenerationMixin If update_proposal is True do the same for underlying proposal This actually replaces the current random number generator(s) with a new one(s) created from the given random_seed.
- property size#
Return the dimentionsize of the underlying probability space
- property proposal#
Get a handle of the proposal
- property optimizer#
- create_rejection_sampler(size, log_density, proposal=None, bounds=None, constraint_test=None, output_dir=None, random_seed=None)[source]#
- Given the size of the target space, and a function to evalute log density,
create and return an
RejectionSampler
instance/object to generate samples using standard Rejection Sampling approach. Configurations/settings can be updated after inistantiation
- Parameters:
size (int) – dimension of the target distribution to sample
log_density – a callable (function) to evaluate the logarithm of the target density (unscaled); this function takes one vector (1d array/iterable) of length equal to size, and returns a scalar
proposal (Proposal) – proposal instance (to generate candidate samples)
constraint_test – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.
random_seed – random seed used when the object is initiated to keep track of random samples This is useful for reproductivity. If None, random seed follows numpy.random.seed rules
- Returns:
instance of
RejctionSampler
(with some or all configurations passed)
Markov Chain Monte Carlo Sampling#
- class ChainDiagnostics(*, acceptance_rate, rejection_rate)[source]#
Bases:
PyOEDData
A dataclass to hold the diagnostics of the Markov chain
- Parameters:
acceptance_rate (float) – the rate of acceptance of the proposed samples
rejection_rate (float) – the rate of rejection of the proposed samples
- acceptance_rate: float#
- rejection_rate: float#
- __init__(*, acceptance_rate, rejection_rate)#
- class SamplingResults(*, chain_state_repository, collected_ensemble, proposals_repository, acceptance_flags, acceptance_probabilities, uniform_random_numbers, map_estimate, map_estimate_log_density, chain_diagnostics, chain_time)[source]#
Bases:
PyOEDData
A dataclass to hold the results of the MCMC sampling
- Parameters:
chain_state_repository (list[ndarray]) – a list of all generated states in the Markov chain
collected_ensemble (list[ndarray]) – a list of all collected samples from the Markov chain
proposals_repository (list[ndarray]) – a list of all proposed states in the Markov chain
acceptance_flags (list[int]) – a list of flags indicating whether the proposed states were accepted or not
acceptance_probabilities (list[float]) – a list of probabilities of accepting the proposed states
uniform_random_numbers (list[float]) – a list of uniform random numbers used in the acceptance step
map_estimate (ndarray) – the MAP estimate of the distribution
map_estimate_log_density (float) – the log-density of the MAP estimate
chain_diagnostics (ChainDiagnostics | None) – the diagnostics of the Markov chain
chain_time (float) – the time taken to generate the Markov chain
- chain_state_repository: list[ndarray]#
- collected_ensemble: list[ndarray]#
- proposals_repository: list[ndarray]#
- acceptance_flags: list[int]#
- acceptance_probabilities: list[float]#
- uniform_random_numbers: list[float]#
- map_estimate: ndarray#
- map_estimate_log_density: float#
- chain_diagnostics: ChainDiagnostics | None#
- chain_time: float#
- __init__(*, chain_state_repository, collected_ensemble, proposals_repository, acceptance_flags, acceptance_probabilities, uniform_random_numbers, map_estimate, map_estimate_log_density, chain_diagnostics, chain_time)#
- class MCMCSamplerConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, burn_in=500, mix_in=10, proposal=None, constraint_test=None)[source]#
Bases:
SamplerConfigs
Configurations for the MCMC sampler
MCMCSampler
.- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
random_seed (None | int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.
size (int | None) – dimension of the target distribution to sample
log_density (Callable[[ndarray], float] | None) – log of the (unscaled) density function to be sampled
burn_in (int) – number of sample points to discard before collecting samples
mix_in (int) – number of generated samples between accepted ones (to descrease autocorrelation)
proposal (Proposal | None) – a proposal object to be used for generating new samples
constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.
- size: int | None#
- log_density: Callable[[ndarray], float] | None#
- burn_in: int#
- mix_in: int#
- constraint_test: Callable[[ndarray], bool] | None#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, burn_in=500, mix_in=10, proposal=None, constraint_test=None)#
- class MCMCSampler(configs=None)[source]#
Bases:
Sampler
Basic class for MCMC sampling with a chosen (Default is a Gaussian) proposal
- __init__(configs=None)[source]#
Implementation of the MCMC sampling algorithm with a predefined proposal
- Parameters:
configs (dict | MCMCSamplerConfigs | None) – a configurations object. See
MCMCSamplerConfigs
.
- validate_configurations(configs, raise_for_invalid=True)[source]#
A method to check the passed configuratios and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.
- Parameters:
configs (dict) – a dictionary holding key/value configurations
raise_for_invalid (bool) – if True raise
TypeError
for invalid configrations type/key
- Returns:
True/False flag indicating whether passed coinfigurations dictionary is valid or not
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid
- Raises:
TypeError
is raised if any of the passed keys inkwargs
is invalid/unrecognized- Remarks:
Generally, we don’t want actual implementations in abstract classes, however, this one is provided as good guidance. Derived classes can rewrite it.
- generate_white_noise(size, truncate=False)[source]#
Generate a standard normal random vector of size size with values truncated at -/+3 if truncate is set to True
- Returns:
a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.
- Remarks:
this function returns a numpy array of size size even if size is set to 1
- log_density(state)[source]#
Evaluate the value of the logarithm of the target unscaled posterior density function
- sample(sample_size=1, initial_state=None, full_diagnostics=False)[source]#
Generate and return a sample of size sample_size. This method returns a list with each entry representing a sample point from the underlying distribution
- Parameters:
sample_size (int)
initial_state
full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory
verbose (bool)
- map_estimate(sample_size=100, initial_state=None, full_diagnostics=False)[source]#
Search for a MAP (maximum aposteriori) estimate by sampling (space exploration) This method returns a single-point estimate of the MAP of the distribution.
- Parameters:
sample_size
initial_state
full_diagnostics
- start_MCMC_sampling(sample_size, initial_state=None, full_diagnostics=False)[source]#
Start the HMC sampling procedure with initial state as passed. Use the underlying configurations for configuring the Hamiltonian trajectory, burn-in and mixin settings.
- Parameters:
sample_size (int) – number of smaple points to generate/collect from the predefined target distribution
initial_state – initial point of the chain (any point that falls in the target distribution or near by it will result in faster convergence). You can try prior mean if this is used in a Bayesian approach
randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration
full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory
- mcmc_chain_diagnostic_statistics(proposals_repository, chain_state_repository, uniform_probabilities, acceptance_probabilities, collected_ensemble, map_estimate=None, acceptance_flags=None, output_dir=None, plot_title='MCMC', filename_prefix='MCMC_Sampling')[source]#
Return diagnostic statistics of the chain such as the rejection rate, acceptance ratio, etc.
- update_random_number_generators(random_seed, update_proposal=True)[source]#
Reset/Update the underlying random_number generator by resetting it’s seed. The random number generator is provided by the RandomNumberGenerationMixin If update_proposal is True do the same for underlying proposal This actually replaces the current random number generator(s) with a new one(s) created from the given random_seed.
- property size#
Return the dimentionsize of the underlying probability space
- property mix_in#
Return the number of generated samples between accepted ones
- property burn_in#
Return the number of sample points to discard before collecting samples
- property constraint_test: Callable[[ndarray], bool]#
Return the function that returns a boolean value True if sample point satisfy any constraint, False otherwise.
- property proposal#
Get a handle of the proposal
- class VALID_INTEGRATORS(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
StrEnum
Valid symplectic integrators for the HMC sampler
- LEAPFROG = '\\Aleapfrog\\Z'#
- VERLET = '\\Averlet\\Z'#
- TWO_STAGE = '\\A2(-|_| )*stage(s)*\\Z'#
- THREE_STAGE = '\\A3(-|_| )*stage(s)*\\Z'#
- class MassMatrix(*, matrix, inverse, sqrt)[source]#
Bases:
PyOEDData
Data container for a covariance matrix of a Gaussian proposal.
- matrix: ndarray | spmatrix#
- inverse: ndarray | spmatrix#
- sqrt: ndarray | spmatrix#
- __init__(*, matrix, inverse, sqrt)#
- class HMCSamplerConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, log_density_grad=None, burn_in=500, mix_in=10, symplectic_integrator='verlet', symplectic_integrator_stepsize=0.01, symplectic_integrator_num_steps=20, mass_matrix=1, sparse=True, constraint_test=None)[source]#
Bases:
SamplerConfigs
Configurations for the HMC sampler
HMCSampler
.- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
random_seed (None | int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.
size (int | None) – dimension of the target distribution to sample
log_density (Callable[[ndarray], float] | None) – log of the (unscaled) density function to be sampled
log_density_grad (Callable[[ndarray], ndarray] | None) – the gradient of the log_density function passed. If None, it will be approximated using finite differences.
burn_in (int) – number of sample points to discard before collecting samples
mix_in (int) – number of generated samples between accepted ones (to descrease autocorrelation)
symplectic_integrator (str) – name of the symplectic integrator to use; acceptable are ‘leapfrog’, ‘2-stage’, ‘3-stage’, where both ‘leapfrog’ and ‘verlet’ are equivalent.
symplectic_integrator_stepsize (float) – the step size of the symplectic integrator
symplectic_integrator_num_steps (int) – number of steps of size symplectic_integrator_stesize
mass_matrix (float | None | ndarray | spmatrix) – mass matrix to be used to adjust sampling the auxilliary Gaussian momentum
constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.
- size: int | None#
- log_density: Callable[[ndarray], float] | None#
- log_density_grad: Callable[[ndarray], ndarray] | None#
- burn_in: int#
- mix_in: int#
- symplectic_integrator: str#
- symplectic_integrator_stepsize: float#
- symplectic_integrator_num_steps: int#
- mass_matrix: float | None | ndarray | spmatrix#
- sparse: bool#
- constraint_test: Callable[[ndarray], bool] | None#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', random_seed=None, size=None, log_density=None, log_density_grad=None, burn_in=500, mix_in=10, symplectic_integrator='verlet', symplectic_integrator_stepsize=0.01, symplectic_integrator_num_steps=20, mass_matrix=1, sparse=True, constraint_test=None)#
- class HMCSampler(configs=None)[source]#
Bases:
Sampler
,RandomNumberGenerationMixin
- __init__(configs=None)[source]#
Implementation of the HMC sampling algorithm (with multiple choices of the symplectic integrator).
- Parameters:
configs (dict | HMCSamplerConfigs | None) – A configurations object. See
HMCSamplerConfigs
.
- validate_configurations(configs, raise_for_invalid=True)[source]#
A method to check the passed configuratios and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used
- Parameters:
configs (dict | SamplerConfigs) – a dictionary holding key/value configurations
raise_for_invalid (bool) – if True raise
TypeError
for invalid configrations type/key
- Returns:
True/False flag indicating whether passed coinfigurations dictionary is valid or not
- Raises:
see the parameter
raise_for_invalid
- Return type:
bool
- update_configurations(**kwargs)[source]#
Update the configurations of the sampler object with the passed key-value pairs.
- update_mass_matrix(mass_matrix)[source]#
Update the mass matrix (standardize, update configurations, and update locally maintained MassMatrix instance.)
- sample(sample_size=1, initial_state=None, full_diagnostics=False)[source]#
Generate and return a sample of size sample_size. This method returns a list with each entry representing a sample point from the underlying distribution
- Parameters:
sample_size (int)
initial_state
full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory
- map_estimate(sample_size=100, initial_state=None, full_diagnostics=False)[source]#
Search for a MAP (maximum aposteriori) estimate by sampling (space exploration) This method returns a single-point estimate of the MAP of the distribution
- Parameters:
sample_size (int)
initial_state
verbose (bool)
- start_MCMC_sampling(sample_size, initial_state=None, randomize_step_size=False, full_diagnostics=False)[source]#
Start the HMC sampling procedure with initial state as passed. Use the underlying configurations for configuring the Hamiltonian trajectory, burn-in and mixin settings.
- Parameters:
sample_size (int) – number of smaple points to generate/collect from the predefined target distribution
initial_state – initial point of the chain (any point that falls in the target distribution or near by it will result in faster convergence). You can try prior mean if this is used in a Bayesian approach
randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration
full_diagnostics (bool) – if True all generated states will be tracked and kept for full disgnostics, otherwise, only collected samples are kept in memory
- generate_white_noise(size, truncate=False)[source]#
- Generate a standard normal random vector of size size with values truncated
at -/+3 if truncate is set to True
- Returns:
a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.
- Remarks:
this function returns a numpy array of size size even if size is set to 1
- mass_matrix_matvec(momentum)[source]#
Multiply the mass matrix (in the configurations) by the passed momentum
- mass_matrix_inv_matvec(momentum)[source]#
Multiply the inverse of the mass matrix (in the configurations) by the passed momentum
- mass_matrix_sqrt_matvec(momentum)[source]#
Multiply the Square root (Lower Cholesky factor) of the mass matrix (in the configurations) by the passed momentum
- log_density(state)[source]#
Evaluate the value of the logarithm of the target unscaled posterior density function
- log_density_grad(state)[source]#
Evaluate the gradient of the logarithm of the target unscaled posterior density function
- potential_energy(state)[source]#
Evaluate the value of the potential energy at the given state. The potential energy is the negative value of the logarithm of the unscaled posterior density function.
- potential_energy_grad(state)[source]#
Evaluate the gradient of the potential energy at the given state. The potential energy is the negative value of the logarithm of the unscaled posterior density function
- kinetic_energy(momentum)[source]#
Evaluate the Kinetic energy of the posterior; this is independent from the state and is evaluated as the weighted l2 norm of the momentum (scaled by the inverse of hte mass matrix); This is half of the squared Mahalanobis distance of the Gaussian momentum.
- total_Hamiltonian(momentum, state)[source]#
Evaluate the value of the total energy function: Hamiltonian = kinetic energy + potential energy
- build_Hamiltonian_trajectory(momentum, state, step_size, num_steps, randomize_step_size=False)[source]#
Given the current momentum and state pair of the Hamiltonian system, generate a trajectory of (momentum, state).
- Parameters:
momentum
state
randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration
- apply_symplectic_integration(momentum, state, step_size, num_steps, randomize_step_size=False, symplectic_integrator='3-stage')[source]#
Apply one full step of size step_size of the symplectic integrator to the Hamiltonian system
- Parm momentum:
- Parameters:
state
num_steps (int)
step_size (float)
symplectic_integrator (str) – name of the symplectic integrator to use; acceptable are: ‘verlet’, ‘leapfrog’, ‘2-stage’, ‘3-stage’, where both ‘leapfrog’ and ‘verlet’ are equivalent
randomize_step_size (bool) – if True a tiny random number is added to the passed step size to help improve space exploration
- Returns:
a tuple (p, s) where p and s are the integrated (forward in time) momentum and state respectively
- mcmc_chain_diagnostic_statistics(proposals_repository, chain_state_repository, uniform_probabilities, acceptance_probabilities, collected_ensemble, map_estimate=None, acceptance_flags=None, output_dir=None, plot_title='MCMC', filename_prefix='MCMC_Sampling')[source]#
Return diagnostic statistics of the chain such as the rejection rate, acceptance ratio, etc.
- property size#
Return the dimentionsize of the underlying probability space
- property mix_in#
Return the number of generated samples between accepted ones
- property burn_in#
Return the number of sample points to discard before collecting samples
- property symplectic_integrator#
Return the name of the symplectic integrator used
- property symplectic_integrator_stepsize#
Return the step size of the symplectic integrator
- property symplectic_integrator_num_steps#
Return the number of steps of the symplectic integrator
- property mass_matrix: MassMatrix#
Return the constructed mass matrix object
- property constraint_test: Callable[[ndarray], bool]#
Return the function that returns a boolean value True if sample point satisfy any constraint, False otherwise.
- create_mcmc_sampler(size, log_density, burn_in=100, mix_in=10, constraint_test=None, proposal_variance=0.1, output_dir=None, random_seed=None)[source]#
- Given the size of the target space, and a function to evalute log density,
create and return an
MCMCSampler
instance/object to generate samples using standard MCMC sampling approach. Configurations/settings can be updated after inistantiation
- Parameters:
size (int) – dimension of the target distribution to sample
log_density – a callable (function) to evaluate the logarithm of the target density (unscaled); this function takes one vector (1d array/iterable) of length equal to size, and returns a scalar
burn_in (int) – number of steps to drop before collecting samples in the chain (for convergence)
mix_in (int) – number of steps to drop between collected samples (reduce autocorrelation)
proposal_variance – variance of the Gaussian proposal
constraint_test – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.
random_seed – random seed used when the object is initiated to keep track of random samples This is useful for reproductivity. If None, random seed follows numpy.random.seed rules
- Returns:
instance of
MCMCSampler
(with some or all configurations passed)
- create_hmc_sampler(size, log_density, log_density_grad=None, burn_in=100, mix_in=5, symplectic_integrator='verlet', symplectic_integrator_stepsize=0.01, symplectic_integrator_num_steps=10, mass_matrix=0.1, constraint_test=None, random_seed=None, output_dir=None)[source]#
- Given the size of the target space, and a function to evalute log density,
create and return an
HMCSampler
instance/object to generate samples using HMC sampling approach. Configurations/settings can be updated after inistantiation
- Parameters:
size (int) – dimension of the target distribution to sample
log_density – a callable (function) to evaluate the logarithm of the target density (unscaled); this function takes one vector (1d array/iterable) of length equal to size, and returns a scalar
log_density_grad – a callable (function) to evaluate the gradient of log_density; this function takes one vector (1d array/iterable) of length equal to size, and returns a vector of the same length. If None is passed, the gradient is automatically evaluated using finite differences (FD).
burn_in (int) – number of steps to drop before collecting samples in the chain (for convergence)
mix_in (int) – number of steps to drop between collected samples (reduce autocorrelation)
symplectic_integrator_stepsize (float) – (positive scalar) the step size of the symplectic integrator
symplectic_integrator_num_steps (int) – (postive integer) number of steps of size symplectic_integrator_stesize taken before returnig the next proposed point over the Hamiltonian trajectory
mass_matrix – (nonzero scalar or SPD array of size size x size) mass matrix to be used to adjust sampling the auxilliary Gaussian momentum
constraint_test – a function that returns a boolean value True if sample point satisfy any constrints, and False otherwise; ignored if None, is passed.
random_seed – random seed used when the object is initiated to keep track of random samples This is useful for reproductivity. If None, random seed follows numpy.random.seed rules
- Returns:
instance of
HMCSampler
(with some or all configurations passed)
- banana_potential_energy_value(state, a=2.15, b=0.75, rho=0.9)[source]#
Potential energy of the posterir. This is dependent on the target state, not the momentum. It is the negative of the posterior-log, and MUST be implemented for each distribution
- banana_potential_energy_gradient(state, a=2.15, b=0.75, rho=0.9)[source]#
Gradient of the Potential energy of the posterir.
- banana_log_density(state, a=2.15, b=0.75, rho=0.9)[source]#
The logarithm of the banana distribution PDF
Proposals#
- class GaussianProposalConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Gaussian', random_seed=None, size=None, mean=None, variance=1.0, constraint_test=None, constraint_max_trials=100)[source]#
Bases:
ProposalConfigs
Configurations for the GaussianProposal class.
- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
name (str) – name of the distribution
random_seed (int | None) – random seed used for pseudo random number generation
size (int | None) – dimension of the target distribution to sample.
mean (float | ndarray | None) – mean of the proposal. If None (default), the mean is set to the current state passed to the sample or __call__ method.
variance (float | ndarray | spmatrix) – variance of the proposal. If None, the variance is set to 1.0.
constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point satisfy any desired constraints, and False otherwise; ignored if None, is passed.
constraint_max_trials (int) – the maximum number of trials to satisfy constraints (e.g., bounds) for each sample point. Default is 100.
- size: int | None#
- mean: float | ndarray | None#
- variance: float | ndarray | spmatrix#
- constraint_test: Callable[[ndarray], bool] | None#
- constraint_max_trials: int#
- sparse = True#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Gaussian', random_seed=None, size=None, mean=None, variance=1.0, constraint_test=None, constraint_max_trials=100)#
- class GaussianCovarianceMatrix(*, matrix, inverse, sqrt)[source]#
Bases:
PyOEDData
Data container for a covariance matrix of a Gaussian proposal.
- matrix: ndarray | spmatrix#
- inverse: ndarray | spmatrix#
- sqrt: ndarray | spmatrix#
- __init__(*, matrix, inverse, sqrt)#
- class GaussianProposal(configs=None)[source]#
Bases:
Proposal
,RandomNumberGenerationMixin
A class implementing a Gaussian proposal for MCMC sampling.
- Parameters:
configs (dict | GaussianProposalConfigs | None) – Configurations object, see
GaussianProposalConfigs
- validate_configurations(configs, raise_for_invalid=True)[source]#
A method to check the passed configurations and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.
- Parameters:
configs – an object holding configurations
raise_for_invalid (bool) – If True, raise an exception if any of the passed configurations is invalid.
- Returns:
True/False flag indicating whether passed configurations dictionary is valid or not
- generate_white_noise(size, truncate=False)[source]#
- Generate a standard normal random vector of size size with values truncated at
-/+3 if truncate is set to True
- Parameters:
size (int) – dimension of the probability space to sample
truncate (bool) – if True, truncate the samples at -/+3, that is any sample point/entry above 3 is set to 3, and any value below -3 is set to -3.
- Returns:
a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.
- Remarks:
this function returns a numpy array of size size even if size is set to 1
- Return type:
ndarray
- covariance_matrix_matvec(state)[source]#
Multiply the mass matrix (in the configurations) by the passed state/vector
- covariance_matrix_inv_matvec(state)[source]#
Multiply the inverse of the mass matrix (in the configurations) by the passed state/vector
- covariance_matrix_sqrt_matvec(state)[source]#
Multiply the Square root (Lower Cholesky factor) of the mass matrix (in the configurations) by the passed state/vector
- sample(sample_size=1, initial_state=None)[source]#
Generate and return a sample of size sample_size from a Gaussian distribution centered around the passed initial_state. If the initial_state is not passed, the preconfigured mean is used. Thus, the proposal mean must be set in configurations, otherwise a TypeError is raised
- Returns:
1d (if internal dimension/size is 1) or 2d numpy array (if internal dimension/size is more than 1) holding samples. Each entry sample[i] (row element) retrieves a sample point from the underlying probability distribution.
- Return type:
ndarray
- pdf(state, normalize=False, log=False, mean=None)[source]#
Evaluate the value of the density function (normalized or upto a fixed scaling constant) at the passed state/vector.
If the mean is not passed, the preconfigured mean is used, thus, the proposal mean must be set in configurations, otherwise a TypeError is raised
- Parameters:
state – What to evaluate the density function at.
normalize (bool) – scale the PDF by the normalization factor
log (bool) – if True, evaluate the gradient of the logarithm of the PDF
- Returns:
the value of the density function evaluated at the passed state/vector.
- pdf_gradient(state, normalize=False, log=False, mean=None)[source]#
Evaluate the gradient of the density function at the passed state/vector.
If the mean is not passed, the preconfigured mean is used, thus, the proposal mean must be set in configurations, otherwise an error raised
- Parameters:
state
normalize (bool) – scale the PDF by the normalization factor
log (bool) – if True, evaluate the gradient of the logarithm of the PDF
- Returns:
the value of the density function evaluated at the passed state/vector.
- log_density(state, normalize=False, mean=None)[source]#
Evaluate the logarithm of the density function at the passed state, where the distrubution mean is adjusted to mean. If mean is None the configured mean is used. This parameter is passed to satisify the needs for the case where the proposal mean changes frequently, e.g., MCMC.
- Parameters:
state
mean
normalize (bool) – scale the PDF by the normalization factor
normalize
- Returns:
the logarithm of the density function evaluated at the passed state.
- log_density_gradient(state, normalize=False, mean=None)[source]#
Evaluate the gradient of the logarithm of the density function at the passed state, where the distrubution mean is adjusted to mean. If mean is None the configured mean is used. This parameter is passed to satisify the needs for the case where the proposal mean changes frequently, e.g., MCMC.
- Parameters:
state – the state to evaluate the gradient at
normalize – scale the PDF by the normalization factor
mean – the mean of the proposal distribution. If None, the configured mean is used.
- Returns:
the gradient of the logarithm of the density function evaluated at the passed state.
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid.
- property size#
Return the dimentionsize of the underlying probability space
- property mean#
Return the mean of the Gaussian proposal
- property covariance: GaussianCovarianceMatrix#
Return the covariance matrix of the Gaussian proposal
- property sparse#
Return the sparsity configuration
- property constraint_test#
Return the constraint test function
- property constraint_max_trials#
Return the maximum number of trials to satisfy constraints
- class GaussianMixtureModelProposalConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GMM Proposal', random_seed=123, size=None, num_components=None, weights=None, means=None, variances=None, constraint_test=None, constraint_max_trials=100, sparse=True)[source]#
Bases:
ProposalConfigs
Configurations for the GaussianMixtureModelProposal class.
- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
name (str) – name of the distribution
random_seed (int) – random seed used for pseudo random number generation
size (int | None) – dimension of the target distribution to sample.
num_components (int | None) – the number of components in the mixture. This is mandatory.
weights (list[float] | None) – a list of weights of each of the GMM component. These must sum up to 1. This is mandatory.
means (list[float | ndarray] | None) – a list of means of each of the components of the GMM model. If None, all components will be centered around the current state passed to the sample or __call__ method.
variances (list[float | ndarray | spmatrix] | None) – a list of variances of each of the components of the GMM model. If None, all components will have a variance of 1.0.
- size: int | None#
- num_components: int | None#
- weights: list[float] | None#
- means: list[float | ndarray] | None#
- variances: list[float | ndarray | spmatrix] | None#
- constraint_test: Callable[[ndarray], bool] | None#
- constraint_max_trials: int#
- sparse: bool#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GMM Proposal', random_seed=123, size=None, num_components=None, weights=None, means=None, variances=None, constraint_test=None, constraint_max_trials=100, sparse=True)#
- class GaussianMixtureModelProposal(configs=None)[source]#
Bases:
Proposal
,RandomNumberGenerationMixin
A class implementing a Gaussian Mixture Model (GMM) proposal for MCMC sampling.
- Parameters:
configs (dict | GaussianMixtureModelProposalConfigs | None) – A configurations object, see
GaussianMixtureModelProposalConfigs
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid.
- validate_configurations(configs, raise_for_invalid=True)[source]#
A method to check the passed configurations and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.
- Parameters:
configs (dict | GaussianMixtureModelProposalConfigs) – an object holding configurations
raise_for_invalid (bool) – If True, raise an exception if any of the passed configurations is invalid.
- Returns:
True/False flag indicating whether passed configurations dictionary is valid or not
- Return type:
bool
- generate_white_noise(size, truncate=False)[source]#
- Generate a standard normal random vector of size size with values truncated
at -/+3 if truncate is set to True
- Parameters:
size (int) – dimension of the probability space to sample
truncate (bool) – if True, truncate the samples at -/+3, that is any sample point/entry above 3 is set to 3, and any value below -3 is set to -3.
- Returns:
a numpy array of size size sampled from a standard multivariate normal distribution of dimension size with mean 0 and covariance matrix equals an identity matrix.
- Remarks:
this function returns a numpy array of size size even if size is set to 1
- covariance_matrix_matvec(state, component)[source]#
Multiply the covariance matrix of the choosen component by the passed state/vector
- Parameters:
state – the state (random variable) to multiple covariance by
component (int) – the index of the component to multiply by.
- covariance_matrix_inv_matvec(state, component)[source]#
Multiply the inverse of the covariance matrix of the choosen component by the passed state/vector
- Parameters:
state – the state (random variable) to multiple covariance by
component (int) – the index of the component to multiply by.
- covariance_matrix_sqrt_matvec(state, component)[source]#
Multiply the square root (lower Cholesky factor) of the covariance matrix of the choosen component by the passed state/vector
- Parameters:
state – the state (random variable) to multiple covariance by
component (int) – the index of the component to multiply by.
- sample_component(component, sample_size=1, initial_state=None)[source]#
Generate and return a sample of size sample_size from a Gaussian distribution centered around the passed initial_state. If the initial_state is not passed, the preconfigured mean of the corresponding component is used, thus, the proposal mean must be set in configurations, otherwise an error is raised
- Parameters:
component (int) – the index of the component to multiply by.
sample_size (int) – the number of samples to generate
initial_state – the state to propose around. If None, the preconfigured mean of the component is used.
- Returns:
1d (if internal dimension/size is 1) or 2d numpy array (if internal dimension/size is more than 1) holding samples. Each entry sample[i] (row element) retrieves a sample point from the underlying probability distribution.
- Return type:
ndarray
- sample(sample_size=1, initial_state=None)[source]#
Generate GMM samples
- Parameters:
sample_size (int) – the number of samples to generate from the GMM model
initial_state – the state to propose around. If None, the preconfigured mean of the component is used.
- Returns:
a numpy array of shape (sample_size, size) sampled from the GMM model
- Return type:
ndarray
- pdf(state, normalize=False, log=False, mean=None)[source]#
Evaluate the value of the density function (normalized or upto a fixed scaling constant) at the passed state/vector.
If a value of the mean is passed, all components’s means are shifted to this state. This gives flexibility in sampling using methods such as MCMC
- Parameters:
state
normalize (bool) – scale the PDF by the normalization factor
log (bool) – if True, evaluate the gradient of the logarithm of the PDF
- Returns:
the value of the density function evaluated at the passed state/vector.
- log_density(state, normalize=False, mean=None)[source]#
Evaluate the logarithm of the density function at the passed state.
If a value of the mean is passed, all components’s means are shifted to this state. This gives flexibility in sampling using methods such as MCMC
- Parameters:
state
component (int) – the index of the component to multiply by. The components are enumerated 0, 1, …
normalize (bool) – scale the PDF by the normalization factor
- Returns:
the logarithm of the density function evaluated at the passed state.
- pdf_component(state, component, normalize=False, mean=None)[source]#
Evaluate the value of the density function of one component (a Gaussian) of the GMM model.
If a value of the mean is passed, the coponent’s means is shifted to this, otherwise, the one in the configurations dictionary is used. This gives flexibility in sampling using methods such as MCMC
- Parameters:
state
component (int) – the index of the component to multiply by. The components are enumerated 0, 1, …
normalize (bool) – scale the PDF by the normalization factor
- Returns:
the logarithm of the density function evaluated at the passed state.
- log_density_component(state, component, normalize=False, mean=None)[source]#
Evaluate the logarithm of the value of the density function of one component (a Gaussian) of the GMM model.
If a value of the mean is passed, the coponent’s means is shifted to this, otherwise, the one in the configurations dictionary is used. This gives flexibility in sampling using methods such as MCMC
- Parameters:
state
component (int) – the index of the component to multiply by. The components are enumerated 0, 1, …
normalize (bool) – scale the PDF by the normalization factor
- Returns:
the logarithm of the density function evaluated at the passed state.
- property size#
Return the dimentionsize of the underlying probability space
- property num_components#
Return the number of components in the GMM model
- property weights#
Return the weights of the GMM model
- property means#
Return the means of the GMM model
- property sparse#
Return the sparsity configuration
- property constraint_test#
Return the constraint test function
- property constraint_max_trials#
Return the maximum number of trials to satisfy constraints
- property covariances: list[GaussianCovarianceMatrix]#
Return the covariance matrices of the GMM model
- class ExponentialProposalConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Exponential', random_seed=123, size=None, scale=None, constraint_test=None, constraint_max_trials=100)[source]#
Bases:
ProposalConfigs
Configurations for the Exponential proposal class.
- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
name (str) – name of the distribution
size (int | None) – dimension of the target distribution to sample.
scale (float | ndarray | None) – scale parameter of the proposal. This may be a scalar or a vector of size size.
constraint_test (Callable[[ndarray], bool] | None) – a function that returns a boolean value True if sample point
constraint_max_trials (int) – the maximum number of trials to satisfy constraints
random_seed (int) – random seed used when the object is initiated to keep track of random samples. This is useful for reproductivity. If None, random seed follows numpy.random.seed rules.
- size: int | None#
- scale: float | ndarray | None#
- constraint_test: Callable[[ndarray], bool] | None#
- constraint_max_trials: int#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Exponential', random_seed=123, size=None, scale=None, constraint_test=None, constraint_max_trials=100)#
- class ExponentialProposal(configs=None)[source]#
Bases:
Proposal
,RandomNumberGenerationMixin
A class implementing an Exponential proposal for MCMC sampling. The distribution assumes independent components of the random vector.
- __init__(configs=None)[source]#
Initialize the Exponential proposal object.
- Parameters:
configs (dict | ExponentialProposalConfigs | None) – A configurations object, see
ExponentialConfigs
- validate_configurations(configs, raise_for_invalid=True)[source]#
A method to check the passed configurations and make sure they are conformable with each other, and with current configurations once combined. This guarantees that any key-value pair passed in configs can be properly used.
- Parameters:
configs (dict | ExponentialProposalConfigs) – an object holding configurations
raise_for_invalid (bool) – If True, raise an exception if any of the passed configurations is invalid.
- Returns:
True/False flag indicating whether passed configurations dictionary is valid or not
- Return type:
bool
- sample(sample_size=1, initial_state=None)[source]#
Generate and return a sample of size sample_size from an iid Exponential distribution with predefine scale parameter. The initial_state is not used here; it is added to unify the
sample()
interface across proposals.- Returns:
1d (if internal dimension/size is 1) or 2d numpy array (if internal dimension/size is more than 1) holding samples. Each entry sample[i] (row element) retrieves a sample point from the underlying probability distribution.
- pdf(state, normalize=False, log=False)[source]#
Evaluate the value of the density function (normalized or upto a fixed scaling constant) at the passed state.
- Parameters:
state
normalize (bool) – scale the PDF by the normalization factor
log (bool) – if True, evaluate the gradient of the logarithm of the PDF
- Returns:
the value of the density function evaluated at the passed state.
- pdf_gradient(state, normalize=False, log=False)[source]#
Evaluate the gradient of the density function at the passed state/vector.
- Parameters:
state
normalize (bool) – scale the PDF by the normalization factor
log (bool) – if True, evaluate the gradient of the logarithm of the PDF
- Returns:
the value of the density function evaluated at the passed state/vector.
- log_density(state, normalize=False)[source]#
Evaluate the logarithm of the density function at the passed state.
- Parameters:
state
normalize (bool)
- Returns:
the logarithm of the density function evaluated at the passed state.
- log_density_gradient(state)[source]#
Evaluate the gradient of the logarithm of the density function at the passed state.
- Parameters:
state
normalize (bool)
- Returns:
the gradient of the logarithm of the density function evaluated at the passed state.
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid
- property size#
Return the dimentionsize of the underlying probability space
- property scale#
Return the scale parameter of the exponential proposal
- property constraint_test#
Return the constraint test function
- property constraint_max_trials#
Return the maximum number of trials to satisfy constraints
- visualize_proposal(proposal, sample_size=1000, xlim=None, ylim=None, title=None, grid_size=200, labels=None, linewidth=1.0, markersize=2, fontsize=18, fontweight='bold', keep_plots=False, output_dir=None, filename_prefix=None, verbose=False)[source]#
- Parameters:
proposal (Proposal)
sample_size (int) – size of the sample to generate for scatter plotting
xlim – x-limit of plots (only used for 2D case)
ylim – y-limit of plots (only used for 2D case)
title – title to be added to created plot(s)
grid_size – number of discretization grid points to create 2D mesh for plots (only used for 2D case)
labels – iterable of labels to be used for the variables on plots (X_i auto used if not passed)
linewidth – width of lines (e.g., for contour plots)
markersize – markers in scatter plots
fontsize – general font size on all plots
fontweight – font weighti
keep_plots – if False all plots will be closed by calling
matplotlib.pyplot.close('all')()
output_dir – location to save files
filename_prefix – if None, a prefix with proposal name will be chosen, e.g., “Exponential_Proposal_”
verbose (bool) – screen verbosity
- Remarks:
Set keep_plots to True, for example, if you are doing interactive plotting or using a notebook to show plots.
- create_Gaussian_proposal(size, mean=None, variance=1, random_seed=None, output_dir=None, constraint_test=None)[source]#
Given the size of the target space, create and return an
GaussianProposal
instance/object to generate samples using Gaussian proposal centered around current state (or predefined mean) Configurations/settings can be updated after inistantiationThis function shows how to create
GaussianProposal
instances (with some or all configurations passed)
- create_GaussianMixtureModel_proposal(size=2, num_components=3, weights=[0.5, 0.25, 0.25], means=[(0, 0), (0, 0.25), (0.25, 0)], variances=[0.04, 0.04, 0.04], random_seed=None, constraint_test=<function <lambda>>)[source]#
Given the size of the target space, create and return an
GaussianMixtureModelProposal
instance/object to generate samples using GMM proposal centered around current state (or predefined mean) Configurations/settings can be updated after inistantiationThis function shows how to create
GaussianProposal
instances (with some or all configurations passed)
- create_Exponential_proposal(size, scale=1, random_seed=None, constraint_test=None, output_dir=None)[source]#
Given the size of the target space, and the scale parameter, create and return an
ExponentialProposal
instance/object to generate samples using and iid Exponential proposal Configurations/settings can be updated after inistantiation- This function shows how to create
ExponentialProposal
instances (with some or all configurations passed)
- This function shows how to create