Machine Learning Tools#
Reinforcement Learning#
Reinfocement Learning (Policy Gradient) tailored for binary OED optimization.
This work was developed initially under DOERL package: ahmedattia/doerl
Note
This was part of Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments; see: https://arxiv.org/abs/2101.05958
- class Policy(*args, **kwargs)[source]#
Bases:
object
- sample_action(*args, **kwargs)[source]#
Use the policy probability distribution to sample a single action
- sample_trajectory(*args, **kwargs)[source]#
Use the policy probability distribution to sample a trajectory, starting from the passed initial state
- Parameters:
init_state
length – length of the trajectory; nuumber of state-action pairs to be sampled
- Returns:
trajectory: list contining state-action pairs [(state, action), …(state, action)].
The action associated with last state should do nothing to the corresponding state number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’ state are generated after the initial state which is used in the first pair in the trajectory.
- conditional_probability(*args, **kwargs)[source]#
Calculate the probability of s1 conditioned by
; i.e.
- conditional_log_probability_gradient(*args, **kwargs)[source]#
Calculate the gradient of log-probability of s1 conditioned by
; i.e. , w.r.t. parameters
- property hyperparameters#
- class State(*args, **kwargs)[source]#
Bases:
object
- __init__(*args, **kwargs)[source]#
A class defining a Reinfocement Learning State for the environment
- property size#
Size of the internal state
- property state#
Retrieve the internal state
- class Action[source]#
Bases:
object
- property size#
Size of the internal action
- property action#
Retrieve the internal action
- class Agent(*args, **kwargs)[source]#
Bases:
object
A class defining a Reinfocement Learning Agent
- trajectory_return(*args, **kwargs)[source]#
Given a trajectory; sequence/list of state-action pairs, calculate the total reward.
Given the current state, and action, return the value of the reward function. Inspect policy if needed
- reward(*args, **kwargs)[source]#
Given the current state, and action, return the value of the reward function. Inspect policy if needed
- property policy#
Handle to the underlying policy
- class FlipAction(size)[source]#
Bases:
Action
A class defining a Reinfocement Learning Action An action is a binary vector that describes which state kept as is, and which is flipped
- Parameters:
size (int) – size of the binary vector/state
- property size#
Size of the internal action
- property action#
Retrieve the internal action
- class BinaryState(size)[source]#
Bases:
State
A class defining a Reinfocement Learning State for the environment
- Parameters:
size (int) – size of the binary vector/state
- update(action, in_place=True)[source]#
Given an action object, update the entries of the internal state, update the state and either return self, or a copy with updated state
- Parameters:
action (FlipAction) – action to apply to the current state
- Returns:
state resulting by applying the passed action on the current state
- class BernoulliPolicy(size, theta=0.5, random_seed=None)[source]#
Bases:
Policy
,RandomNumberGenerationMixin
A class defining a Reinfocement Learning policy, based on Bernoulli distribution
Parameters of the constructor are below.
- Parameters:
size (int) – dimension of the Bernoulli random variable
theta (float) – success probablity:
random_seed (None|int) – random seed to set the underlying random number generator
- __init__(size, theta=0.5, random_seed=None)[source]#
A class defining a Reinfocement Learning policy
- update_policy_parameters(theta)[source]#
Update theta; that is
- Parameters:
theta (float) – probability of sucess
- sample_trajectory(init_state, length)[source]#
Use the policy probability distribution to sample a trajectory, starting from the passed initial state
- Parameters:
init_state (State) – initial state of the trajectory to sample
length (int) – length of the trajectory; nuumber of state-action pairs to be sampled
- Returns:
trajectory; list contining state-action pairs [(state, action), …(state, action)].
Note
The action associated with last state should do nothing to the corresponding state
Number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’
States are generated after the initial state which is used in the first pair in the trajectory.
- Raises:
TypeError if init_state is not an instance of pyoed.ml.reinforcement_leanrning.State class
- conditional_probability(s0, s1, log_probability=True)[source]#
Calculate the probability of s1 conditioned by
; i.e. .Here, we assume iid entries of states