Machine Learning Tools#

Reinforcement Learning#

Reinfocement Learning (Policy Gradient) tailored for binary OED optimization.

This work was developed initially under DOERL package: ahmedattia/doerl

Note

This was part of Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments; see: https://arxiv.org/abs/2101.05958

class Policy(*args, **kwargs)[source]#

Bases: object

__init__(*args, **kwargs)[source]#: A class defining a Reinfocement Learning policy

update_policy_parameters(*args, **kwargs)[source]#

sample_action(*args, **kwargs)[source]#: Use the policy probability distribution to sample a single action

sample_trajectory(*args, **kwargs)[source]#

Use the policy probability distribution to sample a trajectory, starting from the passed initial state

Parameters:

init_state
length – length of the trajectory; nuumber of state-action pairs to be sampled

Returns:

trajectory: list contining state-action pairs [(state, action), …(state, action)].

The action associated with last state should do nothing to the corresponding state number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’ state are generated after the initial state which is used in the first pair in the trajectory.

conditional_probability(*args, **kwargs)[source]#: Calculate the probability of s1 conditioned by $s_{0}$ ; i.e. $p (s_{1} | s_{0})$

conditional_log_probability_gradient(*args, **kwargs)[source]#: Calculate the gradient of log-probability of s1 conditioned by $s_{0}$ ; i.e. $p (s_{1} | s_{0})$ , w.r.t. parameters

property hyperparameters#

class State(*args, **kwargs)[source]#

Bases: object

__init__(*args, **kwargs)[source]#: A class defining a Reinfocement Learning State for the environment

property size#: Size of the internal state

property state#: Retrieve the internal state

class Action[source]#

Bases: object

__init__()[source]#: A class defining a Reinfocement Learning Action

property size#: Size of the internal action

property action#: Retrieve the internal action

class Agent(*args, **kwargs)[source]#

Bases: object

A class defining a Reinfocement Learning Agent

__init__(*args, **kwargs)[source]#

trajectory_return(*args, **kwargs)[source]#

Given a trajectory; sequence/list of state-action pairs, calculate the total reward.

Given the current state, and action, return the value of the reward function. Inspect policy if needed

reward(*args, **kwargs)[source]#: Given the current state, and action, return the value of the reward function. Inspect policy if needed

initialize_state(*args, **kwargs)[source]#: Initialize the state of the environment

update_state(*args, **kwargs)[source]#: Update the environment state to the passed state

sample_trajectory(*args, **kwargs)[source]#: Sample a trajectory as long as the passed length

policy_gradient(*args, **kwargs)[source]#: Calculate policy gradient

train(*args, **kwargs)[source]#: Train the agent to optimize the policy

property policy#: Handle to the underlying policy

class FlipAction(size)[source]#

Bases: Action

A class defining a Reinfocement Learning Action An action is a binary vector that describes which state kept as is, and which is flipped

Parameters:: size (int) – size of the binary vector/state

__init__(size)[source]#: A class defining a Reinfocement Learning Action

copy()[source]#

property size#: Size of the internal action

property action#: Retrieve the internal action

class BinaryState(size)[source]#

Bases: State

A class defining a Reinfocement Learning State for the environment

Parameters:: size (int) – size of the binary vector/state

__init__(size)[source]#: A class defining a Reinfocement Learning State for the environment

copy()[source]#

update(action, in_place=True)[source]#

Given an action object, update the entries of the internal state, update the state and either return self, or a copy with updated state

Parameters:: action (FlipAction) – action to apply to the current state
Returns:: state resulting by applying the passed action on the current state

class BernoulliPolicy(size, theta=0.5, random_seed=None)[source]#

Bases: Policy, RandomNumberGenerationMixin

A class defining a Reinfocement Learning policy, based on Bernoulli distribution

p (x) = θ^{x} \times (1 - θ)^{1 - x}, x \in {0, 1}

Parameters of the constructor are below.

Parameters:

size (int) – dimension of the Bernoulli random variable
theta (float) – success probablity: $p (x = 1)$
random_seed (None|int) – random seed to set the underlying random number generator

__init__(size, theta=0.5, random_seed=None)[source]#: A class defining a Reinfocement Learning policy

update_policy_parameters(theta)[source]#

Update theta; that is $p (x = 1)$

Parameters:: theta (float) – probability of sucess $p (x == 1)$

sample_action()[source]#: Sample an appropriate action

sample_trajectory(init_state, length)[source]#

Use the policy probability distribution to sample a trajectory, starting from the passed initial state

Parameters:

init_state (State) – initial state of the trajectory to sample
length (int) – length of the trajectory; nuumber of state-action pairs to be sampled

Returns:

trajectory; list contining state-action pairs [(state, action), …(state, action)].

Note

The action associated with last state should do nothing to the corresponding state
Number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’
States are generated after the initial state which is used in the first pair in the trajectory.

Raises:: TypeError if init_state is not an instance of pyoed.ml.reinforcement_leanrning.State class

conditional_probability(s0, s1, log_probability=True)[source]#

Calculate the probability of s1 conditioned by $s_{0}$ ; i.e. $p (s_{1} | s_{0})$ .

Here, we assume iid entries of states

Parameters:

s0 (State) – Multivariate Bernoulli state
s1 (State) – Multivariate Bernoulli state
log_probability (bool) – either return probability or the log-probability

Returns:

p; value of the conditional probability

conditional_log_probability_gradient(s0, s1)[source]#

Calculate the gradient of log-probability of s1 conditioned by $s_{0}$ ; i.e. $\log p (s_{1} | s_{0})$ , w.r.t. parameters $θ$ . Here, we assume iid entries of states

Parameters:

s0 (State) – Multivariate Bernoulli state
s1 (State) – Multivariate Bernoulli state

Returns:

log_prob_grad a vector containing derivatives of the log-probability of the multivariate Bernoulli w.r.t the parameters $θ$