Probability Distributions/Models#
Entries on this page:
Note
All distribution classes inherit the
distribution base class (Distribution)
and each distribution class/object is associated with a configurations class derived from
the distribution configurations base class (DistributionConfigs)
Multivariate Bernoulli Model#
- class BernoulliConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Mulitvariate Bernoulli Distribution (iid)', random_seed=None, parameter=0.5)[source]#
Bases:
DistributionConfigs
Configurations class for the
Bernoulli
abstract base class. This class inherits functionality fromDistributionConfigs
and only adds new class-level variables which can be updated as needed.See
DistributionConfigs
for more details on the functionality of this class along with a few additional fields. OtherwiseBernoulli
provides the following fields:- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
random_seed (int | None) – random seed used for pseudo random number generation
name (str) – name of the distribution
parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution
- parameter: float | Iterable[float]#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Mulitvariate Bernoulli Distribution (iid)', random_seed=None, parameter=0.5)#
- class Bernoulli(configs=None)[source]#
Bases:
Distribution
An implementation of the multivariate Bernoulli Distribution with independent components (no covariances).
- Parameters:
configs (dict | BernoulliConfigs | None) – (optional) configurations for the model
- validate_configurations(configs, raise_for_invalid=True)[source]#
Validation stage for the the passed configs.
- Parameters:
configs (dict | BernoulliConfigs) – configurations to validate. If a BernoulliConfigs object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.
- Raises:
PyOEDConfigsValidationError – if the configurations are invalid and raise_for_invalid is set to True.
AttributeError – if any (or a group) of the configurations does not exist in the model configurations
BernoulliConfigs
.
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid
- sample(sample_size=1, antithetic=False, dtype=<class 'bool'>)[source]#
Sample a Bernoulli random variable (1d or multivariate) according to probability of success p, that is :\(p:=(P(x=1))\). If antithetic is True, sample_size must be even
- Parameters:
sample_size (int) – size of the sample to generate
antithetic (bool)
dtype (type) – data type of the returned array
- Returns:
bernoulli_sample: array of shape
sample_size x n
wheren
is the size of p- Raises:
:py:class`TypeError` if the sample_size is invalid
Note
If p is scalar or iterable of length 1, this will be 1d array of size=sample_size. Otherwise, if p is multivariate, this will be 2d array with each row representing one sample.
- expect(func, objective_value_tracker=None)[source]#
Calculate the expected value of a function (func) which accepts w as parameter
- pmf(x, joint=True)[source]#
Calculate the value of the probability mass function (PMF) of a uncorrelated multivariate Bernoulli distribution, evaluated at given binary state/realization x, and with parameters defined by the underlying probability of success.
- Parameters:
x – scalar, or 1D numpy array, or binary values (0/10) or bytes
joint (bool) – if True joint PMF value is returned, otherwise marginal PMF for all entries is returned.
- Returns:
joint PMF (if joint is True; default) or the marginal PMF values (if joint is False)
- Raises:
TypeError
if the passed x has wrong shape/size
- log_pmf(x, joint=True)[source]#
Calculate the value of log-PMF (probability mass function) of a uncorrelated multivariate Bernoulli distribution
- Parameters:
x – scalar, or 1D numpy array, or binary values (0/10) or bytes
joint (bool) – if True joint PMF value is returned, otherwise marginal PMF for all entries is returned.
- Returns:
logarithm of the joint PMF (if joint is True; default) or the logarithmic values of the marginal PMF (if joint is False)
- Raises:
TypeError
if the passed x has wrong shape/size
- grad_pmf(x, joint=True)[source]#
Calculate the gradient of the probability mass function (PMF) of a uncorrelated multivariate Bernoulli distribution, evaluated at given binary state x, and with parameters theta. The derivative is taken with respect to the distribution parameters (success probabilities) not the realization of the random variable x.
- Parameters:
x – scalar, or 1D numpy array, or binary values (0/10) or bytes
joint (bool) – if True joint PMF value is returned, otherwise marginal PMF for all entries is returned.
- Returns:
gradient of the joint PMF (if joint is True; default) or the gradient of the marginal PMF values (if joint is False)
- Raises:
TypeError
if the passed x has wrong shape/size
- grad_log_pmf(x, joint=True, zero_bounds=True)[source]#
- Calculate the gradient of log-PMF (probability mass function),
with respect to distribution parameters (success probability)
- Parameters:
x – scalar, or 1D numpy array, or binary values (0/10) or bytes
joint (bool) – if True joint PMF is considered, otherwise marginal PMF is used.
- Returns:
the gradient of the log probability
Note
Given the assumption that the Bernoulli RVs modeled are uncorrelated, the gradient of log-probabilities is same as partial derivatives of corresponding derivatives of log-prob of each entry; thus, whether joint is True or False the result is the same.
- index_to_binary_state(k, dtype=<class 'bool'>)[source]#
Return the binary state=:math:(v_1, v_2, dots) of dimension as the size of this distribution, with index k.
- ..note::
This is actually a wrapper around the utility function pyoed.utility.math.index_to_binary_state which is added here only for convenience.
- index_from_binary_state(state)[source]#
Reverse of “index_to_binary_state” Return the index k corresponding to the passed state (of dimension=size).
- ..note::
This is actually a wrapper around the utility function pyoed.utility.math.index_from_binary_state which is added here only for convenience.
- property parameter#
Return the underlying probability of success
- property success_probability#
Return the underlying probability of success
- property size#
Return the dimentionsize of the underlying probability space
Poisson Binomial Distribution#
- class PoissonBinomialConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Poisson Binomial Distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')[source]#
Bases:
BernoulliConfigs
Configurations class for the
PoissonBinomial
abstract base class. This class inherits functionality fromPyOEDConfigs
and only adds new class-level variables which can be updated as needed.See
PyOEDConfigs
for more details on the functionality of this class along with a few additional fields. OtherwisePoissonBinomial
provides the following fields:- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
random_seed (int | None) – random seed used for pseudo random number generation
parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution
name (str) – name of the distribution
R_function_evaluation_method (str) – the name of the evaluation metho of the R-function. See
RFunctionConfigs
for supported evaluation methods.
- R_function_evaluation_method: str#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='Poisson Binomial Distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')#
- class PoissonBinomial(configs=None)[source]#
Bases:
Bernoulli
An implementation of the Poisson-binomial distribution which models the sum of independent (non-identical) Bernoulli trials. This version uses Discrete Fourier Transform to calculate probabilities and derivatives following
Method 1
in [1] which evaluates \(R(n, S)\) as a series. For details see [2].- Parameters:
configs (dict | PoissonBinomialConfigs | None) – (optional) configurations for the model
References:
Sean X. Chen, and Jun S. Liu. “Statistical applications of the Poisson-binomial and conditional Bernoulli distributions.” Statistica Sinica (1997): 875-892.
- Ahmed Attia. “Probabilistic Approach to Black-Box Binary Optimization with
Budget Constraints: Application to Sensor Placement.” arXiv preprint arXiv:2406.05830 (2024).
- validate_configurations(configs, raise_for_invalid=True)[source]#
Validation stage for the the passed configs.
- Parameters:
configs (dict | PoissonBinomialConfigs) – configurations to validate. If a PoissonBinomialConfigs object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.
- Raises:
PyOEDConfigsValidationError – if the configurations are invalid and raise_for_invalid is set to True.
AttributeError – if any (or a group) of the configurations does not exist in the model configurations
PoissonBinomialConfigs
.
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid
- calculate_w(p, dtype=<class 'decimal.Decimal'>, log=False, undefined_as_nan=False)[source]#
Calculate Bernoulli weights w from success probabilities p. The weights are defined as:
\[w_i = \frac{p_i}{1-p_i}\]Note
The weights cannot be evaluated for any value of p equal to 1.
Note
This is a wrapper around
RFunction.calculate_w()
- Parameters:
p (Iterable[float]) – a sequence of success probabilities.
dtype (type) – Data type (must be a callable to transform into the input to the desired data type)
log (bool) – return the logarithm of w if True
undefined_as_nan (bool) – if True set the value of w to nan for any value of the probability outside the domain [0, 1)]
- Returns:
a sequence (of the same length as p) with weights of type :py:class`decimal.Decimal`
- Raises:
ValueError – if any of the probabilities are no in the interval [0, 1) and undefined_as_nan is False
- Return type:
Iterable[Decimal | float]
- pmf(n)[source]#
Calculate the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the probability mass function (PMF) of a Poisson-Binomial distribution/model.
- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
- Returns:
value of the PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer
- log_pmf(n)[source]#
Calculate the log probability (log of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the logarithm of the probability mass function (PMF) of a Poisson-Binomial distribution/model.
- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
- Returns:
logarithm of the value of the PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer- Raises:
ValueError
if the probability mass function value is zero at n.
- grad_pmf(n)[source]#
Calculate the derivative/gradient of the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the probability mass function (PMF) of a Poisson-Binomial distribution/model.
Note
This function calculates gradient of
sum_pmf()
with respect to the distribution parameter, i.e., the probability of successes.- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
- Returns:
gradient of the PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer
- grad_log_pmf(n, zero_bounds=True)[source]#
Calculate the derivative/gradient of the log-probability (logarithm of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the log-probability mass function (PMF) of a Poisson-Binomial distribution/model.
Note
This function calculates gradient of
sum_log_pmf()
with respect to the distribution parameter, i.e., the probability of successes.- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
zero_bounds (bool) – if True single-out (set to zero) any entries with zero or 1 probability
- Returns:
gradient of the log-PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer- Raises:
ValueError
if the probability mass function value is zero at n and zero_bounds is False.
- sample(sample_size=1)[source]#
Sample a Poisson binomial random variable according to the PMF calculated from all possible values. This requires calculating PMF for all values of the sum (n).
- Parameters:
sample_size (int) – size of the sample to generate
- Returns:
a sample of n values (the sum of bernoulli trials) calculated based on success probabilities of the trials.
- Raises:
ValueError
if the sample size is not a positive integer.
- expect(func)[source]#
Calculate the expected value of a function (func) which accepts scalars n (the bernoulli sum ) as parameter/argument.
- property R_function#
A handler to the underlying R-Function instance
Conditional Bernoulli Model#
- class ConditionalBernoulliConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='ConditionalBernoulli: Conditional Bernoulli probability model/distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')[source]#
Bases:
BernoulliConfigs
Configurations class for the
ConditionalBernoulliConfigs
abstract base class. This class inherits functionality fromPyOEDConfigs
and only adds new class-level variables which can be updated as needed.See
PyOEDConfigs
for more details on the functionality of this class along with a few additional fields. OtherwiseConditionalBernoulliConfigs
provides the following fields:- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
random_seed (int | None) – random seed used for pseudo random number generation
parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution
name (str) – name of the distribution
R_function_evaluation_method (str) – the name of the evaluation metho of the R-function. See
RFunctionConfigs
for supported evaluation methods.
- R_function_evaluation_method: str#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='ConditionalBernoulli: Conditional Bernoulli probability model/distribution', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation')#
- class ConditionalBernoulli(configs=None)[source]#
Bases:
Bernoulli
An implementation of the conditional Bernoulli model. This models a multivariate bernoulli model condioned by the sum of the number of active entries.
- Parameters:
configs (dict | ConditionalBernoulliConfigs | None) – (optional) configurations for the model
References:
Sean X. Chen, and Jun S. Liu. “Statistical applications of the Poisson-binomial and conditional Bernoulli distributions.” Statistica Sinica (1997): 875-892.
- Ahmed Attia. “Probabilistic Approach to Black-Box Binary Optimization with
Budget Constraints: Application to Sensor Placement.” arXiv preprint arXiv:2406.05830 (2024).
- validate_configurations(configs, raise_for_invalid=True)[source]#
Validation stage for the the passed configs.
- Parameters:
configs (dict | ConditionalBernoulliConfigs) – configurations to validate. If a ConditionalBernoulliConfigs object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.
- Raises:
PyOEDConfigsValidationError – if the configurations are invalid and raise_for_invalid is set to True.
AttributeError – if any (or a group) of the configurations does not exist in the model configurations
ConditionalBernoulliConfigs
.
- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid
- coverage_probability(i, n)[source]#
Given success probability, calculate the inclusion probability (coverage probability) for index i, where the index starts at 0 and ranges to size-1 where size is the dimension of the probability space, that is the size of p.
- inclusion_probability(i, n)#
Given success probability, calculate the inclusion probability (coverage probability) for index i, where the index starts at 0 and ranges to size-1 where size is the dimension of the probability space, that is the size of p.
- calculate_w(p, dtype=<class 'float'>, undefined_as_nan=False)[source]#
Calculate weights from success probabilities p.
- sum_pmf(n)[source]#
Calculate the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the probability mass function (PMF) of a Poisson-Binomial distribution/model.
- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
- Returns:
value of the PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer
- sum_log_pmf(n)[source]#
Calculate the log probability (log of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the logarithm of the probability mass function (PMF) of a Poisson-Binomial distribution/model.
- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
- Returns:
logarithm of the value of the PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer
- grad_sum_pmf(n)[source]#
Calculate the derivative/gradient of the probability (probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the probability mass function (PMF) of a Poisson-Binomial distribution/model.
Note
This function calculates gradient of
sum_pmf()
with respect to the distribution parameter, i.e., the probability of successes.- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
- Returns:
gradient of the PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer
- grad_sum_log_pmf(n)[source]#
Calculate the derivative/gradient of the log-probability (logarithm of the probability mass function) of the sum of the multivariate Bernoulli Distribution. This funciton models the gradient of the log-probability mass function (PMF) of a Poisson-Binomial distribution/model.
Note
This function calculates gradient of
sum_log_pmf()
with respect to the distribution parameter, i.e., the probability of successes.- Parameters:
n – non-negative integer defining the sum (number of nonzero entries) of a multivariate Bernoulli random variable.
- Returns:
gradient of the log-PMF of the Poisson-Binomial model/distribution.
- Raises:
TypeError
if n is not non-negative integer
- pmf(x, n, batch_as_column=True)[source]#
Calculate the value of the probability mass function (PMF) of a Conditional Bernoulli distribution, evaluated at given binary state/realization
x
, and with parameters defined by the underlying probability of success.- Parameters:
x – scalar, or 1D numpy array, or binary values (0/10) or bytes
n – non-negative integer defining the sum to condition on.
- Returns:
value of the PMF of the CB model (probabiltiy of :py:math``x`` conditioned by the sum)
- Raises:
TypeError
if the passedx
has wrong shape/size and/or n is not non-negative integer
- log_pmf(x, n, batch_as_column=True)[source]#
Calculate the log of the probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization)
x
, and with registered parameters theta.Note
This method is just a wrapper that chooses either
_log_pmf()
or_batch_log_pmf()
based on whetherx
is 1d or 2d numpy array, respectively.- Parameters:
x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If
x
is 2D array, each COLUMN is regarded as one instance of the random variable, and the log-pmf is evaluated for each column If you want rows to be regarded as random variable, switchbatch_as_column
toFalse
n – non-negative integer defining the sum to condition on.
batch_as_column – Only used if
x
is 2d array. if Ture, andx
is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.
- Returns:
log-pmf (or batch of log-pmf values) of the probabiltiy of the CB model
- Raises:
TypeError
if the passedx
has wrong shape/size and/or n is not non-negative integer
- grad_pmf(x, n, batch_as_column=True)[source]#
Calculate the gradient of the probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization)
x
, and with parameters theta.Note
This method is just a wrapper that chooses either
_grad_pmf()
or_batch_grad_pmf()
based on whetherx
is 1d or 2d numpy array, respectively.- Parameters:
x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If
x
is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switchbatch_as_column
toFalse
n – non-negative integer defining the sum to condition on.
batch_as_column – Only used if
x
is 2d array. if Ture, andx
is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.
- Returns:
gradient (or batch of gradients) of the probabiltiy of the CB model
- Raises:
TypeError
if the passedx
has wrong shape/size and/or n is not non-negative integer
- grad_log_pmf(x, n, batch_as_column=True)[source]#
Calculate the gradient of the log-probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization)
x
, and with parameters theta.Note
Given the assumption that the Bernoulli RVs modeled are uncorrelated, the gradient of log-probabilities is same as partial derivatives of corresponding derivatives of log-prob of each entry; thus, whether joint is
True
orFalse
the result is the same.- Parameters:
x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If
x
is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switchbatch_as_column
toFalse
n – non-negative integer defining the sum to condition on.
batch_as_column – Only used if
x
is 2d array. if Ture, andx
is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.
- Returns:
gradient (or batch of gradients) of the log-probabiltiy of the CB model
- Raises:
TypeError
if the passedx
has wrong shape/size and/or n is not non-negative integer
- sample(n, sample_size=1, antithetic=False, dtype=<class 'bool'>)[source]#
Sample a Condional Bernoulli random variable (1d or multivariate) according to probability of success p, that is :\(p:=(P(x=1))\), of the underlying Bernoulli random variable. If antithetic is True, sample_size must be even
- Parameters:
n – non-negative integer defining the sum to condition on.
sample_size (int) – size of the sample to generate
antithetic (bool)
dtype (type) – data type of the returned array
random_seed –
None|int
dictates the random seed to be used to initialize the underlying random number generator
- Returns:
bernoulli_sample: array of shape
sample_size x n
wheren
is the size of p- Raises:
ValueError
if n is out of range of possible values or invalid type or the sample size is not a positive integer.
Note
If p is scalar or iterable of length 1, this will be 1d array of size=sample_size. Otherwise, if p is multivariate, this will be 2d array with each row representing one sample.
- expect(func, n, objective_value_tracker=None)[source]#
Calculate the expected value of a function (func) which accepts w as parameter
- property poisson_binomial_model#
A handler to the underlying Poisson Binomial model instance
- property R_function#
A handler to the underlying R-function instance
- class GeneralizedConditionalBernoulliConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GeneralizedConditionalBernoulli: Conditional Bernoulli model with multiple budgets', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation', budgets=None)[source]#
Bases:
ConditionalBernoulliConfigs
Configurations class for the
GeneralizedConditionalBernoulliConfigs
abstract base class. This class inherits functionality fromConditionalBernoulliConfigs
in addition to the following attributes/keys.- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
random_seed (int | None) – random seed used for pseudo random number generation
parameter (float | Iterable[float]) – probability of success of the bernoulli trials. This determins the dimension of the probability distribution
name (str) – name of the distribution
R_function_evaluation_method (str) – the name of the evaluation metho of the R-function. See
RFunctionConfigs
for supported evaluation methods.budgets (None | Iterable[int]) – None or an iterable (of ints) with allowed/feasible budgets. Any budget must be between 0, and the size of the binary variable (inclusive). If None, no budget-constraint is asserted; this is equivalent to setting budget to include all budgets between 0, and the size of the binary variable (inclusive).
- budgets: None | Iterable[int]#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='GeneralizedConditionalBernoulli: Conditional Bernoulli model with multiple budgets', random_seed=None, parameter=0.5, R_function_evaluation_method='tabulation', budgets=None)#
- class GeneralizedConditionalBernoulli(configs=None)[source]#
Bases:
ConditionalBernoulli
A Generalization of the
ConditionalBernoulli
model where the sum is allowed to be a set of values rather than just one value.- update_configurations(**kwargs)[source]#
Take any set of keyword arguments, and lookup each in the configurations, and update as nessesary/possible/valid
- register_budgets(budgets)[source]#
Set the budget (the sum of the Bernoulli random variable to condition on) The budget could be a number (integer) or set of numbers. The probability of each budget/size is recalculated. In the former case, the distribution is identical to the parent class. In the latter, the probability is calculated by conditioning on the union of all budgets.
- Parameters:
budgets (int|iterable(int)) – either an integer or an iterable e.g., list of integers, defining acceptable budgets (sum of the Bernoulli random variable).
- Raises:
TypeError
if the type of budgets is not acceptable.
- check_registered_budgets()[source]#
Check/validate registerd budgets and their probabilities.
- Returns:
the registerd budgets/sizes and the corresponding probabilities.
- Raises:
TypeError
if no valid budget is registered
- coverage_probability(i)[source]#
Calculate the inclusion probability (coverage probability) for index i, where the index starts at 0 and ranges to size-1 where size is the dimension of the probability space, that is the size of p. This is conditioned by the registered budget of course.
Note
Inclusion probability is the probability that 1 appears in a selected sample in the index i
- pmf(x, batch_as_column=True)[source]#
Calculate the value of the probability mass function (PMF) of a Conditional Bernoulli distribution, evaluated at given binary state/realization
x
, and with parameters defined by the underlying probability of success. The variablex
is conditioned by the registered budget.- Parameters:
x – 1D or 2D numpy array of binary values (0/10) or bytes. If
x
is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switchbatch_as_column
toFalse
n – non-negative integer defining the sum to condition on.
batch_as_column – Only used if
x
is 2d array. if Ture, andx
is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.
- Returns:
value of the PMF of the CB model (probabiltiy of :py:math``x`` conditioned by the sum)
- Raises:
TypeError
if the passedx
has wrong shape/size and/or n is not non-negative integer
- log_pmf(x, batch_as_column=True)[source]#
log-PMF conditioned by the registerd budgets. This returns the logarithm of
pmf()
.
- grad_log_pmf(x, batch_as_column=True, zero_bounds=True)[source]#
Calculate the gradient of the log-probability mass function (PMF) of a generalized conditional Bernoulli distribution, evaluated at given binary state or a batch of states (random variable realization)
x
, and with parameters theta.Note
This method is just a wrapper that chooses either
_grad_log_pmf()
or_batch_grad_log_pmf()
based on whetherx
is 1d or 2d numpy array, respectively.- Parameters:
x – scalar, or 1D or 2D numpy array of binary values (0/10) or bytes. If
x
is 2D array, each COLUMN is regarded as one instance of the random variable, and the gradient is evaluated for each column. If you want rows to be regarded as random variable, switchbatch_as_column
to ``n – non-negative integer defining the sum to condition on.
batch_as_column – Only used if
x
is 2d array. if Ture, andx
is two dimensional, each column is regarded as instance of the random variable (default), otherwise, each row is taken as a random variable.
- Returns:
gradient (or batch of gradients) of the log-probabiltiy of the GCB model
- Raises:
TypeError
if the passedx
has wrong shape/size and/or n is not non-negative integer
- grad_pmf(x, batch_as_column=True)[source]#
Calculate the gradient of the probability mass function (PMF) of a conditional Bernoulli distribution, evaluated at given binary state x, and with parameters theta. The variable
x
is conditioned by the registered budget.- Parameters:
x – scalar, or 1D numpy array, or binary values (0/10) or bytes
n – non-negative integer defining the sum to condition on.
- Returns:
gradient of the probabiltiy of the CB model
- Raises:
TypeError
if the passedx
has wrong shape/size and/or n is not non-negative integer
- sample(sample_size=1, antithetic=False, dtype=<class 'bool'>)[source]#
Sample a Condional Bernoulli random variable (1d or multivariate) according to probability of success p, that is :\(p:=(P(x=1))\), of the underlying Bernoulli random variable. If antithetic is True, sample_size must be even. The random variable is conditioned by the registered budgets.
Note
This is similar to
ConditionalBernoulli.sample()
except that we replace n with the registered budgets. To sample, we first sample sizes based on proabilities of each budget, and then sample the CB model conditioned by each sample size.- Parameters:
n – non-negative integer defining the sum to condition on.
sample_size (int) – size of the sample to generate
antithetic (bool)
dtype (type) – data type of the returned array
random_seed –
None|int
dictates the random seed to be used to initialize the underlying random number generator
- Returns:
bernoulli_sample: array of shape
sample_size x n
wheren
is the size of p- Raises:
ValueError
if the sample_size is not a positive integer or if no proper budget registered with nonzero probabilities.
Note
If p is scalar or iterable of length 1, this will be 1d array of size=sample_size. Otherwise, if p is multivariate, this will be 2d array with each row representing one sample.
- expect(func, objective_value_tracker=None)[source]#
Calculate the expected value of a function (func) which accepts w as parameter
- property conditional_bernoulli_model#
Return a reference to the underlying Conditional Bernoulli Model
- property budgets#
Copy of the budget sizes list
- property budgets_probabilities#
Copy of the budget sizes probabilities
Combinatorial Functions#
This module provides access to useful combinatorial functions and tools.
- class RFunctionConfigs(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='R-Function', method='tabulation')[source]#
Bases:
PyOEDConfigs
Configurations class for the
RFunction
abstract base class. This class inherits functionality fromPyOEDConfigs
and only adds new class-level variables which can be updated as needed.See
PyOEDConfigs
for more details on the functionality of this class along with a few additional fields. OtherwiseRFunction
provides the following fields:- Parameters:
verbose (bool) – a boolean flag to control verbosity of the object.
debug (bool) – a boolean flag that enables adding extra functionlity in a debug mode
output_dir (str | Path) – the base directory where the output files will be saved.
name (str) – name of the class. Default is ‘R-Function’.
method (str) –
the method to use for calculating the R-function and its derivative. Only two values are accepted:
’recursion’: The first method with closed form-recurrence relation is used.
’tabulation’: (default) The second method where values of the R-function and derivatives are tabulated row-by-row.
- name: str#
- method: str#
- __init__(*, debug=False, verbose=False, output_dir='./_PYOED_RESULTS_', name='R-Function', method='tabulation')#
- class RFunction(configs=None)[source]#
Bases:
PyOEDObject
Implementations of the R-function along with its derivatives. The code here provides two methods to calculating the value of R-function \(R(n, S)\) for a given set of weights \((w_1, w_2, w_{N})\), where \(S:=\{1, 2, \ldots, N\}\).
- Parameters:
configs (dict | RFunctionConfigs | None) – (optional) configurations for the R-function. Configurations are ported from
RFunctionConfigs
.
- validate_configurations(configs, raise_for_invalid=True)[source]#
Each simulation model SHOULD implement it’s own function that validates its own configurations. If the validation is self contained (validates all configuations), then that’s it. However, one can just validate the configurations of of the immediate class and call super to validate configurations associated with the parent class.
If one does not wish to do any validation (we strongly advise against that), simply add the signature of this function to the model class.
Note
The purposed of this method is to make sure that the settings in the configurations object self._CONFIGURATIONS are of the right type/values and are conformable with each other. This function is called upon instantiation of the object, and each time a configuration value is updated. Thus, this function need to be inexpensive and should not do heavy computations.
- Parameters:
configs (dict | RFunctionConfigs) – configurations to validate. If a
RFunctionConfigs
object is passed, validation is performed on the entire set of configurations. However, if a dictionary is passed, validation is performed only on the configurations corresponding to the keys in the dictionary.- Raises:
PyOEDConfigsValidationError – if the configurations are invalid and raise_for_invalid is set to True.
AttributeError – if any (or a group) of the configurations does not exist in the model configurations
ToyLinearTimeIndependentConfigs
.
- calculate_w(p, dtype=<class 'decimal.Decimal'>, log=False, undefined_as_nan=False)[source]#
Calculate Bernoulli weights w from success probabilities p. The weights are defined as:
\[w_i = \frac{p_i}{1-p_i}\]Note
The weights cannot be evaluated for any value of p equal to 1.
- Parameters:
p (Iterable[float]) – a sequence of success probabilities.
dtype (type) – Data type (must be a callable to transform into the input to the desired data type)
log (bool) – return the logarithm of w if True
undefined_as_nan (bool) – if True set the value of w to nan for any value of the probability outside the domain [0, 1)]
- Returns:
a sequence (of the same length as p) with weights of type :py:class`decimal.Decimal`
- Raises:
ValueError – if any of the probabilities are no in the interval [0, 1) and undefined_as_nan is False
- Return type:
Iterable[Decimal | float]
- evaluate(n, w, log=False, dtype=<class 'decimal.Decimal'>)[source]#
Evaluate the value of the R-function \(R(n, S)\) where \(S=\{1, 2, \ldots, N\}\), with \(N\) being the length/size of the weights vector \(w\).
Note
This method is a wrapper that calls either
evaluate_by_recursion()
orevaluate_by_tabulation()
based on the registered evaluation method.- Parameters:
n (int) – an integer which defines the first argument of the R-function.
w (iterable) – the vector of weights derived from Bernoulli trials parameters..
log (bool) – if True return the logarithm (natural logarithm) log(R(n, S)), otherwise return the value of R(n, S)
- Returns:
the value(s) of R(n, S) either as a scalar (if n is not None) or a sequence if n is None.
- Return type:
decimal.Decimal
or a list of decimal.Decimal values
- evaluate_by_recursion(n, w, log=False, dtype=<class 'decimal.Decimal'>, enforce_non_negative_R=False)[source]#
Evaluate the value of the R-function \(R(n, S)\) where \(S=\{1, 2, \ldots, N\}\), with \(N\) being the length/size of the weights vector \(w\). Here, the recursion method is used.
Calculate the R(n, S) function value, where \(S:=\{1, 2, \ldots,N \}\) where \(N\) is the length/size of w, and w is the weights vector calculated from the probability of success of a multivariate Bernoulli ditribution \(\theta\) as \(w=\frac{\theta}{1-\theta}\). The R-funciton is given by:
\[R(z, S) := \sum_{B\in A\,; |B|=k} \prod_{i\in B} w_i \,;\, w_i := \frac{\theta_i }{1-\theta_i}\]This is calculated using the recurrence relation
\[R(z, S) = \frac{1}{z} \sum_{i=1}^{z} (-1)^{i+1} T(i, S) R(z-i, S)\,;\, T(i, S) := \sum_{j\in S} w_j^{i}\]Warning
This method is numerically unstable, especially for larg values of Ns!
Note
The R function returns very high numbers for large dimensions (nature of combinatorics), and thus one shouldn’t use numpy arrays to store such values. We have to use native Python numbers (and store things in lists).
- Parameters:
n (int) – an integer which defines the first argument of the R-function.
w (iterable) – the vector of weights derived from Bernoulli trials parameters..
log (bool) – if True return the logarithm (natural logarithm) log(R(n, S)), otherwise return the value of R(n, S)
- Returns:
the value(s) of R(n, S) either as a scalar (if n is not None) or a sequence if n is None.
- Return type:
decimal.Decimal
or a list of decimal.Decimal values- Raises:
TypeError
n is not integer or w is of unrecognized type
- evaluate_by_tabulation(n, w, log=False, dtype=<class 'decimal.Decimal'>)[source]#
Evaluate the value of the R-function \(R(n, S)\) (or its logarithm) where \(S=\{1, 2, \ldots, N\}\), with \(N\) being the length/size of the weights vector \(w\). Here, the recursion method is used.
Calculate the R(n, S) function value, where \(S:=\{1, 2, \ldots,N \}\) where \(N\) is the length/size of w, and w is the weights vector calculated from the probability of success of a multivariate Bernoulli ditribution \(\theta\) as \(w=\frac{\theta}{1-\theta}\). The R-funciton is given by:
\[R(z, S) := \sum_{B\in A\,; |B|=k} \prod_{i\in B} w_i \,;\, w_i := \frac{\theta_i }{1-\theta_i}\]This is calculated using a tabulated relationship with c(i, j) entry of the table calculated by the following recurrence relation
\[c(i, j) = \frac{1}{z} \sum_{i=1}^{z} (-1)^{i+1} T(i, S) R(z-i, S)\,;\, T(i, S) := \sum_{j\in S} w_j^{i} \, i<=j \,, i=0, 1, \ldots, N\]and \(R(z, S)\) is the value in the cell c(z, ) and N is the cardinality of S:={1, 2,ldots, N}`.
Note
The R function returns very high numbers for large dimensions (nature of combinatorics), and thus one can’t use numpy arrays to store such values. We have to use native Python numbers (and store things in lists).
Warning
This method is numerically unstable, especially for larg values of Ns!
Note
The R function returns very high numbers for large dimensions (nature of combinatorics), and thus one shouldn’t use numpy arrays to store such values. We have to use native Python numbers (and store things in lists).
- Parameters:
n (int|None) – if an integer passed it must be in the interval [0, N] where N is the size/dimension of the probaility distribution. If None, the values of R(n, S) for all possible values of n are returned.
w (iterable) – the vector of weights derived from Bernoulli trials parameters..
log (bool) – if True return the logarithm (natural logarithm) log(R(n, S)), otherwise return the value of R(n, S)
- Returns:
the value(s) of R(n, S) either as a scalar (if n is not None) or a sequence if n is None.
- Return type:
decimal.Decimal
or a list of decimal.Decimal values- Raises:
TypeError
n is not integer or w is of unrecognized type- Raises:
ValueError
if any of the weights in w fall outside the interval [0, 1].
- gradient(n, w, dtype=<class 'decimal.Decimal'>, log=False)[source]#
Evaluate the gradient of the R-function \(R(n, S)\), or its logarithm, with respect to the weights w.
Note
This method is a wrapper that calls either
gradient_by_recursion()
orgradient_by_tabulation()
based on the registered evaluation method.
- gradient_by_recursion(n, w, dtype=<class 'decimal.Decimal'>, log=False, enforce_non_negative_R=False)[source]#
Evaluate the gradient of the R-function \(R(n, S)\) with respect to the weights w. This method returns the derivative of the result generated by
evaluate_by_recursion()
, and accepts the same arguments.Note
If log is True this function returns the gradient of the logarithm of the R-function. This is simply evaluated by applying the rule of derivative of the logarithm.
- gradient_by_tabulation(n, w, dtype=<class 'decimal.Decimal'>, log=False)[source]#
Evaluate the gradient of the R-function \(R(n, S)\) (or its logarithm) with respect to the weights w. This method returns the derivative of the result generated by
evaluate_by_tabulation()
, and accepts the same arguments.
- property verbose#
Screen verbosity of the model
- property method#
Return the name of the evaluation method used