Probability Distributions/Models#

Entries on this page:

Note

All distribution classes inherit the distribution base class (Distribution) and each distribution class/object is associated with a configurations class derived from the distribution configurations base class (DistributionConfigs)

Multivariate Bernoulli Model#

class BernoulliConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Mulitvariate Bernoulli Distribution (iid)', random_seed=None, parameter=0.5)[source]#

Bases: DistributionConfigs

Configurations class for the Bernoulli abstract base class. This class inherits functionality from DistributionConfigs and only adds new class-level variables which can be updated as needed.

See DistributionConfigs for more details on the functionality of this class along with a few additional fields. Otherwise Bernoulli provides the following fields:

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • random_seed (int | None) – random seed used for pseudo random number generation

  • name (str) – name of the distribution

  • parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution

parameter: float | Iterable[float]#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Mulitvariate Bernoulli Distribution (iid)', random_seed=None, parameter=0.5)#
class Bernoulli(configs=None)[source]#

Bases: Distribution

An implementation of the multivariate Bernoulli Distribution with independent components (no covariances).

Parameters:

configs (dict | BernoulliConfigs | None) – (optional) configurations for the model

__init__(configs=None)[source]#

Initialize the random number generator

validate_configurations(configs, raise_for_invalid=True)[source]#

Validation stage for the the passed configs.

Parameters:

configs (dict | BernoulliConfigs) – configurations to validate. If a BernoulliConfigs object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.

Raises:
  • PyOEDConfigsValidationError – if the configurations are invalid and raise_for_invalid is set to True.

  • AttributeError – if any (or a group) of the configurations does not exist in the model configurations BernoulliConfigs.

update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid

update_parameter(p)[source]#

Update the success probability and any dependent values/configurations

sample(sample_size=1, antithetic=False, dtype=<class 'bool'>)[source]#

Sample a Bernoulli random variable (1d or multivariate) according to probability of success p, that is :\(p:=(P(x=1))\). If antithetic is True, sample_size must be even

Parameters:
  • sample_size (int) – size of the sample to generate

  • antithetic (bool)

  • dtype (type) – data type of the returned array

Returns:

bernoulli_sample: array of shape sample_size x n where n is the size of p

Raises:

:py:class`TypeError` if the sample_size is invalid

Note

If p is scalar or iterable of length 1, this will be 1d array of size=sample_size. Otherwise, if p is multivariate, this will be 2d array with each row representing one sample.

expect(func, objective_value_tracker=None)[source]#

Calculate the expected value of a function (func) which accepts w as parameter

pmf(x, joint=True)[source]#

Calculate the value of the probability mass function (PMF) of a uncorrelated multivariate Bernoulli distribution, evaluated at given binary state/realization x, and with parameters defined by the underlying probability of success.

Parameters:
  • x – scalar, or 1D numpy array, or binary values (0/10) or bytes

  • joint (bool) – if True joint PMF value is returned, otherwise marginal PMF for all entries is returned.

Returns:

joint PMF (if joint is True; default) or the marginal PMF values (if joint is False)

Raises:

TypeError if the passed x has wrong shape/size

log_pmf(x, joint=True)[source]#

Calculate the value of log-PMF (probability mass function) of a uncorrelated multivariate Bernoulli distribution

Parameters:
  • x – scalar, or 1D numpy array, or binary values (0/10) or bytes

  • joint (bool) – if True joint PMF value is returned, otherwise marginal PMF for all entries is returned.

Returns:

logarithm of the joint PMF (if joint is True; default) or the logarithmic values of the marginal PMF (if joint is False)

Raises:

TypeError if the passed x has wrong shape/size

grad_pmf(x, joint=True)[source]#

Calculate the gradient of the probability mass function (PMF) of a uncorrelated multivariate Bernoulli distribution, evaluated at given binary state x, and with parameters theta. The derivative is taken with respect to the distribution parameters (success probabilities) not the realization of the random variable x.

Parameters:
  • x – scalar, or 1D numpy array, or binary values (0/10) or bytes

  • joint (bool) – if True joint PMF value is returned, otherwise marginal PMF for all entries is returned.

Returns:

gradient of the joint PMF (if joint is True; default) or the gradient of the marginal PMF values (if joint is False)

Raises:

TypeError if the passed x has wrong shape/size

grad_log_pmf(x, joint=True, zero_bounds=True)[source]#
Calculate the gradient of log-PMF (probability mass function),

with respect to distribution parameters (success probability)

Parameters:
  • x – scalar, or 1D numpy array, or binary values (0/10) or bytes

  • joint (bool) – if True joint PMF is considered, otherwise marginal PMF is used.

Returns:

the gradient of the log probability

Note

Given the assumption that the Bernoulli RVs modeled are uncorrelated, the gradient of log-probabilities is same as partial derivatives of corresponding derivatives of log-prob of each entry; thus, whether joint is True or False the result is the same.

index_to_binary_state(k, dtype=<class 'bool'>)[source]#

Return the binary state=:math:(v_1, v_2, dots) of dimension as the size of this distribution, with index k.

..note::

This is actually a wrapper around the utility function pyoed.utility.math.index_to_binary_state which is added here only for convenience.

index_from_binary_state(state)[source]#

Reverse of “index_to_binary_state” Return the index k corresponding to the passed state (of dimension=size).

..note::

This is actually a wrapper around the utility function pyoed.utility.math.index_from_binary_state which is added here only for convenience.

property parameter#

Return the underlying probability of success

property success_probability#

Return the underlying probability of success

property size#

Return the dimentionsize of the underlying probability space

Poisson Binomial Distribution#

class PoissonBinomialConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Poisson Binomial Distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')[source]#

Bases: BernoulliConfigs

Configurations class for the PoissonBinomial abstract base class. This class inherits functionality from PyOEDConfigs and only adds new class-level variables which can be updated as needed.

See PyOEDConfigs for more details on the functionality of this class along with a few additional fields. Otherwise PoissonBinomial provides the following fields:

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • random_seed (int | None) – random seed used for pseudo random number generation

  • parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution

  • name (str) – name of the distribution

  • R_function_evaluation_method (str) – the name of the evaluation metho of the R-function. See RFunctionConfigs for supported evaluation methods.

R_function_evaluation_method: str#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Poisson Binomial Distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')#
class PoissonBinomial(configs=None)[source]#

Bases: Bernoulli

An implementation of the Poisson-binomial distribution which models the sum of independent (non-identical) Bernoulli trials. This version uses Discrete Fourier Transform to calculate probabilities and derivatives following Method 1 in [1] which evaluates \(R(n, S)\) as a series. For details see [2].

Parameters:

configs (dict | PoissonBinomialConfigs | None) – (optional) configurations for the model

References:

  1. Sean X. Chen, and Jun S. Liu. “Statistical applications of the Poisson-binomial and conditional Bernoulli distributions.” Statistica Sinica (1997): 875-892.

  1. Ahmed Attia. “Probabilistic Approach to Black-Box Binary Optimization with

    Budget Constraints: Application to Sensor Placement.” arXiv preprint arXiv:2406.05830 (2024).

__init__(configs=None)[source]#

Initialize the random number generator

validate_configurations(configs, raise_for_invalid=True)[source]#

Validation stage for the the passed configs.

Parameters:

configs (dict | PoissonBinomialConfigs) – configurations to validate. If a PoissonBinomialConfigs object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.

Raises:
update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid

calculate_w(p, dtype=<class 'decimal.Decimal'>, log=False, undefined_as_nan=False)[source]#

Calculate Bernoulli weights w from success probabilities p. The weights are defined as:

\[w_i = \frac{p_i}{1-p_i}\]

Note

The weights cannot be evaluated for any value of p equal to 1.

Note

This is a wrapper around RFunction.calculate_w()

Parameters:
  • p (Iterable[float]) – a sequence of success probabilities.

  • dtype (type) – Data type (must be a callable to transform into the input to the desired data type)

  • log (bool) – return the logarithm of w if True

  • undefined_as_nan (bool) – if True set the value of w to nan for any value of the probability outside the domain [0, 1)]

Returns:

a sequence (of the same length as p) with weights of type :py:class`decimal.Decimal`

Raises:

ValueError – if any of the probabilities are no in the interval [0, 1) and undefined_as_nan is False

Return type:

Iterable[Decimal | float]

pmf(n)[source]#

Calculate the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the probability mass function (PMF) of a Poisson-Binomial distribution/model.

Parameters:

n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

Returns:

value of the PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

log_pmf(n)[source]#

Calculate the log probability (log of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the logarithm of the probability mass function (PMF) of a Poisson-Binomial distribution/model.

Parameters:

n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

Returns:

logarithm of the value of the PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

Raises:

ValueError if the probability mass function value is zero at n.

grad_pmf(n)[source]#

Calculate the derivative/gradient of the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the probability mass function (PMF) of a Poisson-Binomial distribution/model.

Note

This function calculates gradient of sum_pmf() with respect to the distribution parameter, i.e., the probability of successes.

Parameters:

n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

Returns:

gradient of the PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

grad_log_pmf(n, zero_bounds=True)[source]#

Calculate the derivative/gradient of the log-probability (logarithm of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the log-probability mass function (PMF) of a Poisson-Binomial distribution/model.

Note

This function calculates gradient of sum_log_pmf() with respect to the distribution parameter, i.e., the probability of successes.

Parameters:
  • n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

  • zero_bounds (bool) – if True single-out (set to zero) any entries with zero or 1 probability

Returns:

gradient of the log-PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

Raises:

ValueError if the probability mass function value is zero at n and zero_bounds is False.

sample(sample_size=1)[source]#

Sample a Poisson binomial random variable according to the PMF calculated from all possible values. This requires calculating PMF for all values of the sum (n).

Parameters:

sample_size (int) – size of the sample to generate

Returns:

a sample of n values (the sum of bernoulli trials) calculated based on success probabilities of the trials.

Raises:

ValueError if the sample size is not a positive integer.

expect(func)[source]#

Calculate the expected value of a function (func) which accepts scalars n (the bernoulli sum ) as parameter/argument.

property R_function#

A handler to the underlying R-Function instance

Conditional Bernoulli Model#

class ConditionalBernoulliConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='ConditionalBernoulli: Conditional Bernoulli probability model/distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')[source]#

Bases: BernoulliConfigs

Configurations class for the ConditionalBernoulliConfigs abstract base class. This class inherits functionality from PyOEDConfigs and only adds new class-level variables which can be updated as needed.

See PyOEDConfigs for more details on the functionality of this class along with a few additional fields. Otherwise ConditionalBernoulliConfigs provides the following fields:

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • random_seed (int | None) – random seed used for pseudo random number generation

  • parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution

  • name (str) – name of the distribution

  • R_function_evaluation_method (str) – the name of the evaluation metho of the R-function. See RFunctionConfigs for supported evaluation methods.

R_function_evaluation_method: str#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='ConditionalBernoulli: Conditional Bernoulli probability model/distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')#
class ConditionalBernoulli(configs=None)[source]#

Bases: Bernoulli

An implementation of the conditional Bernoulli model. This models a multivariate bernoulli model condioned by the sum of the number of active entries.

For details see [1] and [2].

Parameters:

configs (dict | ConditionalBernoulliConfigs | None) – (optional) configurations for the model

References:

  1. Sean X. Chen, and Jun S. Liu. “Statistical applications of the Poisson-binomial and conditional Bernoulli distributions.” Statistica Sinica (1997): 875-892.

  1. Ahmed Attia. “Probabilistic Approach to Black-Box Binary Optimization with

    Budget Constraints: Application to Sensor Placement.” arXiv preprint arXiv:2406.05830 (2024).

__init__(configs=None)[source]#

Initialize the random number generator

validate_configurations(configs, raise_for_invalid=True)[source]#

Validation stage for the the passed configs.

Parameters:

configs (dict | ConditionalBernoulliConfigs) – configurations to validate. If a ConditionalBernoulliConfigs object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.

Raises:
update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid

coverage_probability(i, n)[source]#

Given success probability, calculate the inclusion probability (coverage probability) for index i, where the index starts at 0 and ranges to size-1 where size is the dimension of the probability space, that is the size of p.

inclusion_probability(i, n)#

Given success probability, calculate the inclusion probability (coverage probability) for index i, where the index starts at 0 and ranges to size-1 where size is the dimension of the probability space, that is the size of p.

calculate_w(p, dtype=<class 'float'>, undefined_as_nan=False)[source]#

Calculate weights from success probabilities p.

sum_pmf(n)[source]#

Calculate the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the probability mass function (PMF) of a Poisson-Binomial distribution/model.

Parameters:

n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

Returns:

value of the PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

sum_log_pmf(n)[source]#

Calculate the log probability (log of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the logarithm of the probability mass function (PMF) of a Poisson-Binomial distribution/model.

Parameters:

n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

Returns:

logarithm of the value of the PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

grad_sum_pmf(n)[source]#

Calculate the derivative/gradient of the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the probability mass function (PMF) of a Poisson-Binomial distribution/model.

Note

This function calculates gradient of sum_pmf() with respect to the distribution parameter, i.e., the probability of successes.

Parameters:

n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

Returns:

gradient of the PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

grad_sum_log_pmf(n)[source]#

Calculate the derivative/gradient of the log-probability (logarithm of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the log-probability mass function (PMF) of a Poisson-Binomial distribution/model.

Note

This function calculates gradient of sum_log_pmf() with respect to the distribution parameter, i.e., the probability of successes.

Parameters:

n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.

Returns:

gradient of the log-PMF of the Poisson-Binomial model/distribution.

Raises:

TypeError if n is not non-negative integer

pmf(x, n, batch_as_column=True)[source]#

Calculate the value of the probability mass function (PMF) of a Conditional Bernoulli distribution, evaluated at given binary state/realization x, and with parameters defined by the underlying probability of success.

Parameters:
  • x – scalar, or 1D numpy array, or binary values (0/10) or bytes

  • n – non-negative integer defining the sum to condition on.

Returns:

value of the PMF of the CB model (probabiltiy of :py:math``x`` conditioned by the sum)

Raises:

TypeError if the passed x has wrong shape/size and/or n is not non-negative integer

log_pmf(x, n, batch_as_column=True)[source]#

Calculate the log of the probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization) x, and with registered parameters theta.

Note

This method is just a wrapper that chooses either _log_pmf() or _batch_log_pmf() based on whether x is 1d or 2d numpy array, respectively.

Parameters:
  • x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If x is 2D array, each COLUMN is regarded as one instance of the random variable, and the log-pmf is evaluated for each column If you want rows to be regarded as random variable, switch batch_as_column to False

  • n – non-negative integer defining the sum to condition on.

  • batch_as_column – Only used if x is 2d array. if Ture, and x is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.

Returns:

log-pmf (or batch of log-pmf values) of the probabiltiy of the CB model

Raises:

TypeError if the passed x has wrong shape/size and/or n is not non-negative integer

grad_pmf(x, n, batch_as_column=True)[source]#

Calculate the gradient of the probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization) x, and with parameters theta.

Note

This method is just a wrapper that chooses either _grad_pmf() or _batch_grad_pmf() based on whether x is 1d or 2d numpy array, respectively.

Parameters:
  • x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If x is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switch batch_as_column to False

  • n – non-negative integer defining the sum to condition on.

  • batch_as_column – Only used if x is 2d array. if Ture, and x is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.

Returns:

gradient (or batch of gradients) of the probabiltiy of the CB model

Raises:

TypeError if the passed x has wrong shape/size and/or n is not non-negative integer

grad_log_pmf(x, n, batch_as_column=True)[source]#

Calculate the gradient of the log-probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization) x, and with parameters theta.

Note

Given the assumption that the Bernoulli RVs modeled are uncorrelated, the gradient of log-probabilities is same as partial derivatives of corresponding derivatives of log-prob of each entry; thus, whether joint is True or False the result is the same.

Parameters:
  • x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If x is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switch batch_as_column to False

  • n – non-negative integer defining the sum to condition on.

  • batch_as_column – Only used if x is 2d array. if Ture, and x is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.

Returns:

gradient (or batch of gradients) of the log-probabiltiy of the CB model

Raises:

TypeError if the passed x has wrong shape/size and/or n is not non-negative integer

sample(n, sample_size=1, antithetic=False, dtype=<class 'bool'>)[source]#

Sample a Condional Bernoulli random variable (1d or multivariate) according to probability of success p, that is :\(p:=(P(x=1))\), of the underlying Bernoulli random variable. If antithetic is True, sample_size must be even

Parameters:
  • n – non-negative integer defining the sum to condition on.

  • sample_size (int) – size of the sample to generate

  • antithetic (bool)

  • dtype (type) – data type of the returned array

  • random_seedNone|int dictates the random seed to be used to initialize the underlying random number generator

Returns:

bernoulli_sample: array of shape sample_size x n where n is the size of p

Raises:

ValueError if n is out of range of possible values or invalid type or the sample size is not a positive integer.

Note

If p is scalar or iterable of length 1, this will be 1d array of size=sample_size. Otherwise, if p is multivariate, this will be 2d array with each row representing one sample.

expect(func, n, objective_value_tracker=None)[source]#

Calculate the expected value of a function (func) which accepts w as parameter

property poisson_binomial_model#

A handler to the underlying Poisson Binomial model instance

property R_function#

A handler to the underlying R-function instance

class GeneralizedConditionalBernoulliConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GeneralizedConditionalBernoulli: Conditional Bernoulli model with multiple budgets', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation', budgets=None)[source]#

Bases: ConditionalBernoulliConfigs

Configurations class for the GeneralizedConditionalBernoulliConfigs abstract base class. This class inherits functionality from ConditionalBernoulliConfigs in addition to the following attributes/keys.

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • random_seed (int | None) – random seed used for pseudo random number generation

  • parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution

  • name (str) – name of the distribution

  • R_function_evaluation_method (str) – the name of the evaluation metho of the R-function. See RFunctionConfigs for supported evaluation methods.

  • budgets (None | Iterable[int]) – None or an iterable (of ints) with allowed/feasible budgets. Any budget must be between 0, and the size of the binary variable (inclusive). If None, no budget-constraint is asserted; this is equivalent to setting budget to include all budgets between 0, and the size of the binary variable (inclusive).

budgets: None | Iterable[int]#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GeneralizedConditionalBernoulli: Conditional Bernoulli model with multiple budgets', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation', budgets=None)#
class GeneralizedConditionalBernoulli(configs=None)[source]#

Bases: ConditionalBernoulli

A Generalization of the ConditionalBernoulli model where the sum is allowed to be a set of values rather than just one value.

__init__(configs=None)[source]#

Initialize the random number generator

update_configurations(**kwargs)[source]#

Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid

register_budgets(budgets)[source]#

Set the budget (the sum of the Bernoulli random variable to condition on) The budget could be a number (integer) or set of numbers. The probability of each budget/size is recalculated. In the former case, the distribution is identical to the parent class. In the latter, the probability is calculated by conditioning on the union of all budgets.

Parameters:

budgets (int|iterable(int)) – either an integer or an iterable e.g., list of integers, defining acceptable budgets (sum of the Bernoulli random variable).

Raises:

TypeError if the type of budgets is not acceptable.

check_registered_budgets()[source]#

Check/validate registerd budgets and their probabilities.

Returns:

the registerd budgets/sizes and the corresponding probabilities.

Raises:

TypeError if no valid budget is registered

coverage_probability(i)[source]#

Calculate the inclusion probability (coverage probability) for index i, where the index starts at 0 and ranges to size-1 where size is the dimension of the probability space, that is the size of p. This is conditioned by the registered budget of course.

Note

Inclusion probability is the probability that 1 appears in a selected sample in the index i

pmf(x, batch_as_column=True)[source]#

Calculate the value of the probability mass function (PMF) of a Conditional Bernoulli distribution, evaluated at given binary state/realization x, and with parameters defined by the underlying probability of success. The variable x is conditioned by the registered budget.

Parameters:
  • x – 1D or 2D numpy array of binary values (0/10) or bytes. If x is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switch batch_as_column to False

  • n – non-negative integer defining the sum to condition on.

  • batch_as_column – Only used if x is 2d array. if Ture, and x is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.

Returns:

value of the PMF of the CB model (probabiltiy of :py:math``x`` conditioned by the sum)

Raises:

TypeError if the passed x has wrong shape/size and/or n is not non-negative integer

log_pmf(x, batch_as_column=True)[source]#

log-PMF conditioned by the registerd budgets. This returns the logarithm of pmf().

grad_log_pmf(x, batch_as_column=True, zero_bounds=True)[source]#

Calculate the gradient of the log-probability mass function (PMF) of a generalized conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization) x, and with parameters theta.

Note

This method is just a wrapper that chooses either _grad_log_pmf() or _batch_grad_log_pmf() based on whether x is 1d or 2d numpy array, respectively.

Parameters:
  • x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If x is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switch batch_as_column to ``

  • n – non-negative integer defining the sum to condition on.

  • batch_as_column – Only used if x is 2d array. if Ture, and x is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.

Returns:

gradient (or batch of gradients) of the log-probabiltiy of the GCB model

Raises:

TypeError if the passed x has wrong shape/size and/or n is not non-negative integer

grad_pmf(x, batch_as_column=True)[source]#

Calculate the gradient of the probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state x, and with parameters theta. The variable x is conditioned by the registered budget.

Parameters:
  • x – scalar, or 1D numpy array, or binary values (0/10) or bytes

  • n – non-negative integer defining the sum to condition on.

Returns:

gradient of the probabiltiy of the CB model

Raises:

TypeError if the passed x has wrong shape/size and/or n is not non-negative integer

sample(sample_size=1, antithetic=False, dtype=<class 'bool'>)[source]#

Sample a Condional Bernoulli random variable (1d or multivariate) according to probability of success p, that is :\(p:=(P(x=1))\), of the underlying Bernoulli random variable. If antithetic is True, sample_size must be even. The random variable is conditioned by the registered budgets.

Note

This is similar to ConditionalBernoulli.sample() except that we replace n with the registered budgets. To sample, we first sample sizes based on proabilities of each budget, and then sample the CB model conditioned by each sample size.

Parameters:
  • n – non-negative integer defining the sum to condition on.

  • sample_size (int) – size of the sample to generate

  • antithetic (bool)

  • dtype (type) – data type of the returned array

  • random_seedNone|int dictates the random seed to be used to initialize the underlying random number generator

Returns:

bernoulli_sample: array of shape sample_size x n where n is the size of p

Raises:

ValueError if the sample_size is not a positive integer or if no proper budget registered with nonzero probabilities.

Note

If p is scalar or iterable of length 1, this will be 1d array of size=sample_size. Otherwise, if p is multivariate, this will be 2d array with each row representing one sample.

expect(func, objective_value_tracker=None)[source]#

Calculate the expected value of a function (func) which accepts w as parameter

property conditional_bernoulli_model#

Return a reference to the underlying Conditional Bernoulli Model

property budgets#

Copy of the budget sizes list

property budgets_probabilities#

Copy of the budget sizes probabilities

Combinatorial Functions#

This module provides access to useful combinatorial functions and tools.

class RFunctionConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='R-Function', method='tabulation')[source]#

Bases: PyOEDConfigs

Configurations class for the RFunction abstract base class. This class inherits functionality from PyOEDConfigs and only adds new class-level variables which can be updated as needed.

See PyOEDConfigs for more details on the functionality of this class along with a few additional fields. Otherwise RFunction provides the following fields:

Parameters:
  • verbose (bool) – a boolean flag to control verbosity of the object.

  • debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode

  • output_dir (str | Path) – the base directory where the output files will be saved.

  • name (str) – name of the class. Default is ‘R-Function’.

  • method (str) –

    the method to use for calculating the R-function and its derivative. Only two values are accepted:

    • ’recursion’: The first method with closed form-recurrence relation is used.

    • ’tabulation’: (default) The second method where values of the R-function and derivatives are tabulated row-by-row.

name: str#
method: str#
__init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='R-Function', method='tabulation')#
class RFunction(configs=None)[source]#

Bases: PyOEDObject

Implementations of the R-function along with its derivatives. The code here provides two methods to calculating the value of R-function \(R(n, S)\) for a given set of weights \((w_1, w_2, w_{N})\), where \(S:=\{1, 2, \ldots, N\}\).

Parameters:

configs (dict | RFunctionConfigs | None) – (optional) configurations for the R-function. Configurations are ported from RFunctionConfigs.

__init__(configs=None)[source]#
validate_configurations(configs, raise_for_invalid=True)[source]#

Each simulation model SHOULD implement it’s own function that validates its own configurations. If the validation is self contained (validates all configuations), then that’s it. However, one can just validate the configurations of of the immediate class and call super to validate configurations associated with the parent class.

If one does not wish to do any validation (we strongly advise against that), simply add the signature of this function to the model class.

Note

The purposed of this method is to make sure that the settings in the configurations object self._CONFIGURATIONS are of the right type/values and are conformable with each other. This function is called upon instantiation of the object, and each time a configuration value is updated. Thus, this function need to be inexpensive and should not do heavy computations.

Parameters:

configs (dict | RFunctionConfigs) – configurations to validate. If a RFunctionConfigs object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.

Raises:
  • PyOEDConfigsValidationError – if the configurations are invalid and raise_for_invalid is set to True.

  • AttributeError – if any (or a group) of the configurations does not exist in the model configurations ToyLinearTimeIndependentConfigs.

calculate_w(p, dtype=<class 'decimal.Decimal'>, log=False, undefined_as_nan=False)[source]#

Calculate Bernoulli weights w from success probabilities p. The weights are defined as:

\[w_i = \frac{p_i}{1-p_i}\]

Note

The weights cannot be evaluated for any value of p equal to 1.

Parameters:
  • p (Iterable[float]) – a sequence of success probabilities.

  • dtype (type) – Data type (must be a callable to transform into the input to the desired data type)

  • log (bool) – return the logarithm of w if True

  • undefined_as_nan (bool) – if True set the value of w to nan for any value of the probability outside the domain [0, 1)]

Returns:

a sequence (of the same length as p) with weights of type :py:class`decimal.Decimal`

Raises:

ValueError – if any of the probabilities are no in the interval [0, 1) and undefined_as_nan is False

Return type:

Iterable[Decimal | float]

evaluate(n, w, log=False, dtype=<class 'decimal.Decimal'>)[source]#

Evaluate the value of the R-function \(R(n, S)\) where \(S=\{1, 2, \ldots, N\}\), with \(N\) being the length/size of the weights vector \(w\).

Note

This method is a wrapper that calls either evaluate_by_recursion() or evaluate_by_tabulation() based on the registered evaluation method.

Parameters:
  • n (int) – an integer which defines the first argument of the R-function.

  • w (iterable) – the vector of weights derived from Bernoulli trials parameters..

  • log (bool) – if True return the logarithm (natural logarithm) log(R(n, S)), otherwise return the value of R(n, S)

Returns:

the value(s) of R(n, S) either as a scalar (if n is not None) or a sequence if n is None.

Return type:

decimal.Decimal or a list of decimal.Decimal values

evaluate_by_recursion(n, w, log=False, dtype=<class 'decimal.Decimal'>, enforce_non_negative_R=False)[source]#

Evaluate the value of the R-function \(R(n, S)\) where \(S=\{1, 2, \ldots, N\}\), with \(N\) being the length/size of the weights vector \(w\). Here, the recursion method is used.

Calculate the R(n, S) function value, where \(S:=\{1, 2, \ldots,N \}\) where \(N\) is the length/size of w, and w is the weights vector calculated from the probability of success of a multivariate Bernoulli ditribution \(\theta\) as \(w=\frac{\theta}{1-\theta}\). The R-funciton is given by:

\[R(z, S) := \sum_{B\in A\,; |B|=k} \prod_{i\in B} w_i \,;\, w_i := \frac{\theta_i }{1-\theta_i}\]

This is calculated using the recurrence relation

\[R(z, S) = \frac{1}{z} \sum_{i=1}^{z} (-1)^{i+1} T(i, S) R(z-i, S)\,;\, T(i, S) := \sum_{j\in S} w_j^{i}\]

Warning

This method is numerically unstable, especially for larg values of Ns!

Note

The R function returns very high numbers for large dimensions (nature of combinatorics), and thus one shouldn’t use numpy arrays to store such values. We have to use native Python numbers (and store things in lists).

Parameters:
  • n (int) – an integer which defines the first argument of the R-function.

  • w (iterable) – the vector of weights derived from Bernoulli trials parameters..

  • log (bool) – if True return the logarithm (natural logarithm) log(R(n, S)), otherwise return the value of R(n, S)

Returns:

the value(s) of R(n, S) either as a scalar (if n is not None) or a sequence if n is None.

Return type:

decimal.Decimal or a list of decimal.Decimal values

Raises:

TypeError n is not integer or w is of unrecognized type

evaluate_by_tabulation(n, w, log=False, dtype=<class 'decimal.Decimal'>)[source]#

Evaluate the value of the R-function \(R(n, S)\) (or its logarithm) where \(S=\{1, 2, \ldots, N\}\), with \(N\) being the length/size of the weights vector \(w\). Here, the recursion method is used.

Calculate the R(n, S) function value, where \(S:=\{1, 2, \ldots,N \}\) where \(N\) is the length/size of w, and w is the weights vector calculated from the probability of success of a multivariate Bernoulli ditribution \(\theta\) as \(w=\frac{\theta}{1-\theta}\). The R-funciton is given by:

\[R(z, S) := \sum_{B\in A\,; |B|=k} \prod_{i\in B} w_i \,;\, w_i := \frac{\theta_i }{1-\theta_i}\]

This is calculated using a tabulated relationship with c(i, j) entry of the table calculated by the following recurrence relation

\[c(i, j) = \frac{1}{z} \sum_{i=1}^{z} (-1)^{i+1} T(i, S) R(z-i, S)\,;\, T(i, S) := \sum_{j\in S} w_j^{i} \, i<=j \,, i=0, 1, \ldots, N\]

and \(R(z, S)\) is the value in the cell c(z, ) and N is the cardinality of S:={1, 2,ldots, N}`.

Note

The R function returns very high numbers for large dimensions (nature of combinatorics), and thus one can’t use numpy arrays to store such values. We have to use native Python numbers (and store things in lists).

Warning

This method is numerically unstable, especially for larg values of Ns!

Note

The R function returns very high numbers for large dimensions (nature of combinatorics), and thus one shouldn’t use numpy arrays to store such values. We have to use native Python numbers (and store things in lists).

Parameters:
  • n (int|None) – if an integer passed it must be in the interval [0, N] where N is the size/dimension of the probaility distribution. If None, the values of R(n, S) for all possible values of n are returned.

  • w (iterable) – the vector of weights derived from Bernoulli trials parameters..

  • log (bool) – if True return the logarithm (natural logarithm) log(R(n, S)), otherwise return the value of R(n, S)

Returns:

the value(s) of R(n, S) either as a scalar (if n is not None) or a sequence if n is None.

Return type:

decimal.Decimal or a list of decimal.Decimal values

Raises:

TypeError n is not integer or w is of unrecognized type

Raises:

ValueError if any of the weights in w fall outside the interval [0, 1].

gradient(n, w, dtype=<class 'decimal.Decimal'>, log=False)[source]#

Evaluate the gradient of the R-function \(R(n, S)\), or its logarithm, with respect to the weights w.

Note

This method is a wrapper that calls either gradient_by_recursion() or gradient_by_tabulation() based on the registered evaluation method.

gradient_by_recursion(n, w, dtype=<class 'decimal.Decimal'>, log=False, enforce_non_negative_R=False)[source]#

Evaluate the gradient of the R-function \(R(n, S)\) with respect to the weights w. This method returns the derivative of the result generated by evaluate_by_recursion(), and accepts the same arguments.

Note

If log is True this function returns the gradient of the logarithm of the R-function. This is simply evaluated by applying the rule of derivative of the logarithm.

gradient_by_tabulation(n, w, dtype=<class 'decimal.Decimal'>, log=False)[source]#

Evaluate the gradient of the R-function \(R(n, S)\) (or its logarithm) with respect to the weights w. This method returns the derivative of the result generated by evaluate_by_tabulation(), and accepts the same arguments.

property verbose#

Screen verbosity of the model

property method#

Return the name of the evaluation method used