Machine Learning Tools#

Reinforcement Learning#

Reinfocement Learning (Policy Gradient) tailored for binary OED optimization.

Note

This was part of Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments; see: https://arxiv.org/abs/2101.05958

class Policy(*args, **kwargs)[source]#

Bases: object

__init__(*args, **kwargs)[source]#

A class defining a Reinfocement Learning policy

update_policy_parameters(*args, **kwargs)[source]#
sample_action(*args, **kwargs)[source]#

Use the policy probability distribution to sample a single action

sample_trajectory(*args, **kwargs)[source]#

Use the policy probability distribution to sample a trajectory, starting from the passed initial state

Parameters:
  • init_state

  • length – length of the trajectory; nuumber of state-action pairs to be sampled

Returns:

trajectory: list contining state-action pairs [(state, action), …(state, action)].

The action associated with last state should do nothing to the corresponding state number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’ state are generated after the initial state which is used in the first pair in the trajectory.

conditional_probability(*args, **kwargs)[source]#

Calculate the probability of s1 conditioned by s0; i.e. p(s1|s0)

conditional_log_probability_gradient(*args, **kwargs)[source]#

Calculate the gradient of log-probability of s1 conditioned by s0; i.e. p(s1|s0), w.r.t. parameters

property hyperparameters#
class State(*args, **kwargs)[source]#

Bases: object

__init__(*args, **kwargs)[source]#

A class defining a Reinfocement Learning State for the environment

property size#

Size of the internal state

property state#

Retrieve the internal state

class Action[source]#

Bases: object

__init__()[source]#

A class defining a Reinfocement Learning Action

property size#

Size of the internal action

property action#

Retrieve the internal action

class Agent(*args, **kwargs)[source]#

Bases: object

A class defining a Reinfocement Learning Agent

__init__(*args, **kwargs)[source]#
trajectory_return(*args, **kwargs)[source]#

Given a trajectory; sequence/list of state-action pairs, calculate the total reward.

Given the current state, and action, return the value of the reward function. Inspect policy if needed

reward(*args, **kwargs)[source]#

Given the current state, and action, return the value of the reward function. Inspect policy if needed

initialize_state(*args, **kwargs)[source]#

Initialize the state of the environment

update_state(*args, **kwargs)[source]#

Update the environment state to the passed state

sample_trajectory(*args, **kwargs)[source]#

Sample a trajectory as long as the passed length

policy_gradient(*args, **kwargs)[source]#

Calculate policy gradient

train(*args, **kwargs)[source]#

Train the agent to optimize the policy

property policy#

Handle to the underlying policy

class FlipAction(size)[source]#

Bases: Action

A class defining a Reinfocement Learning Action An action is a binary vector that describes which state kept as is, and which is flipped

Parameters:

size (int) – size of the binary vector/state

__init__(size)[source]#

A class defining a Reinfocement Learning Action

copy()[source]#
property size#

Size of the internal action

property action#

Retrieve the internal action

class BinaryState(size)[source]#

Bases: State

A class defining a Reinfocement Learning State for the environment

Parameters:

size (int) – size of the binary vector/state

__init__(size)[source]#

A class defining a Reinfocement Learning State for the environment

copy()[source]#
update(action, in_place=True)[source]#

Given an action object, update the entries of the internal state, update the state and either return self, or a copy with updated state

Parameters:

action (FlipAction) – action to apply to the current state

Returns:

state resulting by applying the passed action on the current state

class BernoulliPolicy(size, theta=0.5, random_seed=None)[source]#

Bases: Policy, RandomNumberGenerationMixin

A class defining a Reinfocement Learning policy, based on Bernoulli distribution

p(x)=θx×(1θ)1x,x{0,1}

Parameters of the constructor are below.

Parameters:
  • size (int) – dimension of the Bernoulli random variable

  • theta (float) – success probablity: p(x=1)

  • random_seed (None|int) – random seed to set the underlying random number generator

__init__(size, theta=0.5, random_seed=None)[source]#

A class defining a Reinfocement Learning policy

update_policy_parameters(theta)[source]#

Update theta; that is p(x=1)

Parameters:

theta (float) – probability of sucess p(x==1)

sample_action()[source]#

Sample an appropriate action

sample_trajectory(init_state, length)[source]#

Use the policy probability distribution to sample a trajectory, starting from the passed initial state

Parameters:
  • init_state (State) – initial state of the trajectory to sample

  • length (int) – length of the trajectory; nuumber of state-action pairs to be sampled

Returns:

trajectory; list contining state-action pairs [(state, action), …(state, action)].

Note

  • The action associated with last state should do nothing to the corresponding state

  • Number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’

  • States are generated after the initial state which is used in the first pair in the trajectory.

Raises:

TypeError if init_state is not an instance of pyoed.ml.reinforcement_leanrning.State class

conditional_probability(s0, s1, log_probability=True)[source]#

Calculate the probability of s1 conditioned by s0; i.e. p(s1|s0).

Here, we assume iid entries of states

Parameters:
  • s0 (State) – Multivariate Bernoulli state

  • s1 (State) – Multivariate Bernoulli state

  • log_probability (bool) – either return probability or the log-probability

Returns:

p; value of the conditional probability

conditional_log_probability_gradient(s0, s1)[source]#

Calculate the gradient of log-probability of s1 conditioned by s0; i.e. logp(s1|s0), w.r.t. parameters θ. Here, we assume iid entries of states

Parameters:
  • s0 (State) – Multivariate Bernoulli state

  • s1 (State) – Multivariate Bernoulli state

Returns:

log_prob_grad a vector containing derivatives of the log-probability of the multivariate Bernoulli w.r.t the parameters θ