Machine Learning Tools#

This package provides machine learning components for solving OED problems, currently focused on reinforcement learning (RL). The RL approach treats OED as a sequential decision-making problem where a learned policy maps the current experimental design (state) to a modification (action).

Note

This page is also accessible from the Optimization subpackage, where ReinforceOptimizer and its binary variants consume these components.

Key Classes at a Glance#

pyoed.ml.reinforcement_learning.Policy

Abstract base class defining a Reinforcement Learning policy for OED.

pyoed.ml.reinforcement_learning.State

Abstract base class defining a Reinforcement Learning state.

pyoed.ml.reinforcement_learning.Action

Abstract base class defining a Reinforcement Learning action.

pyoed.ml.reinforcement_learning.Agent

Abstract base class defining a Reinforcement Learning agent.

pyoed.ml.reinforcement_learning.BernoulliPolicy

A class defining a Reinfocement Learning policy, based on Bernoulli distribution

pyoed.ml.reinforcement_learning.BinaryState

A Reinforcement Learning state represented as a binary vector.

pyoed.ml.reinforcement_learning.FlipAction

A Reinforcement Learning action represented as a binary flip vector.

Reinforcement Learning Concepts in PyOED#

The PyOED RL framework maps directly onto standard RL terminology:

RL concept

PyOED class

Description

State

State

Current experimental design vector \(\zeta\)

Action

Action

Modification to the design (e.g., flip a sensor on/off)

Policy

Policy

Distribution over actions given the current state

Agent

Agent

Runs episodes, accumulates rewards, and updates the policy via REINFORCE

Reward

OED criterion value

How good the current design is (e.g., negative A-optimality)

Built-in policies:

  • BernoulliPolicy — independent Bernoulli distribution per sensor; suitable for unconstrained binary OED.

Built-in states / actions:

  • BinaryState — binary design vector with copy() and update() helpers.

  • FlipAction — flips a single entry of the binary design.

Reinforcement Learning#

Reinforcement Learning (Policy Gradient) tailored for binary OED optimization.

This module provides base classes for RL-based OED, including abstract Policy, State, and Action classes that define the RL interface for experimental design problems.

This work was developed initially under the DOERL package: ahmedattia/doerl

Note

This was part of Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments; see: https://arxiv.org/abs/2101.05958

class Policy(*args, **kwargs)[source]#

Bases: object

Abstract base class defining a Reinforcement Learning policy for OED.

A policy maps the current state of the environment (experiment) to a probability distribution over actions (design decisions). Subclasses must implement all methods.

class MyPolicy(Policy):
    def __init__(self, size):
        self._size = size
        self._params = np.ones(size) * 0.5

    def update_policy_parameters(self, new_params):
        self._params = new_params

    def sample_action(self, state):
        return np.random.binomial(1, self._params)
__init__(*args, **kwargs)[source]#

Initialize the policy. Must be implemented by subclasses.

update_policy_parameters(*args, **kwargs)[source]#

Update the policy parameters (e.g., after a gradient step).

Parameters:
  • args – implementation-specific positional arguments.

  • kwargs – implementation-specific keyword arguments.

sample_action(*args, **kwargs)[source]#

Use the policy probability distribution to sample a single action.

Returns:

the sampled action.

sample_trajectory(*args, **kwargs)[source]#

Use the policy probability distribution to sample a trajectory, starting from the passed initial state.

Parameters:
  • init_state – the initial state to start the trajectory from.

  • length – length of the trajectory; number of state-action pairs to be sampled.

Returns:

trajectory as a list containing state-action pairs [(state, action), ..., (state, action)]. The action associated with the last state should do nothing to the corresponding state. The number of entries in the trajectory is equal to length + 1, i.e., length states are generated after the initial state which is used in the first pair in the trajectory.

Return type:

list[tuple]

conditional_probability(*args, **kwargs)[source]#

Calculate the probability of s1 conditioned by \(s_0\); i.e. \(p(s_1|s_0)\)

conditional_log_probability_gradient(*args, **kwargs)[source]#

Calculate the gradient of log-probability of s1 conditioned by \(s_0\); i.e. \(p(s_1|s_0)\), w.r.t. parameters

property hyperparameters#
class State(*args, **kwargs)[source]#

Bases: object

Abstract base class defining a Reinforcement Learning state.

A state represents the current configuration of the environment (e.g., the current binary experimental design vector). Subclasses must implement __init__ to initialize the internal state representation.

__init__(*args, **kwargs)[source]#
property size#

Size of the internal state

property state#

Retrieve the internal state

class Action[source]#

Bases: object

Abstract base class defining a Reinforcement Learning action.

An action represents a modification to be applied to the current state (e.g., which bits of a binary design to flip). Subclasses must implement __init__ to initialize the internal action representation.

__init__()[source]#
property size#

Size of the internal action

property action#

Retrieve the internal action

class Agent(*args, **kwargs)[source]#

Bases: object

Abstract base class defining a Reinforcement Learning agent.

An agent encapsulates the policy, environment interaction, and training loop. It samples trajectories, computes rewards, estimates policy gradients, and updates the policy parameters.

__init__(*args, **kwargs)[source]#
trajectory_return(*args, **kwargs)[source]#

Given a trajectory; sequence/list of state-action pairs, calculate the total reward.

Given the current state, and action, return the value of the reward function. Inspect policy if needed

reward(*args, **kwargs)[source]#

Given the current state, and action, return the value of the reward function. Inspect policy if needed

initialize_state(*args, **kwargs)[source]#

Initialize the state of the environment

update_state(*args, **kwargs)[source]#

Update the environment state to the passed state

sample_trajectory(*args, **kwargs)[source]#

Sample a trajectory as long as the passed length

policy_gradient(*args, **kwargs)[source]#

Calculate policy gradient

train(*args, **kwargs)[source]#

Train the agent to optimize the policy

property policy#

Handle to the underlying policy

class FlipAction(size)[source]#

Bases: Action

A Reinforcement Learning action represented as a binary flip vector.

Each entry indicates whether the corresponding state entry is kept as-is (False) or flipped (True).

Parameters:

size (int) – size of the binary vector/state

__init__(size)[source]#
copy()[source]#
property size#

Size of the internal action

property action#

Retrieve the internal action

class BinaryState(size)[source]#

Bases: State

A Reinforcement Learning state represented as a binary vector.

Parameters:

size (int) – size of the binary vector/state

__init__(size)[source]#
copy()[source]#
update(action, in_place=True)[source]#

Given an action object, update the entries of the internal state, update the state and either return self, or a copy with updated state

Parameters:

action (FlipAction) – action to apply to the current state

Returns:

state resulting by applying the passed action on the current state

class BernoulliPolicy(size, theta=0.5, random_seed=None)[source]#

Bases: Policy, RandomNumberGenerationMixin

A class defining a Reinfocement Learning policy, based on Bernoulli distribution

\[p(x) = \theta^{x}\times (1-\theta)^{1-x}, \quad x \in \{0, 1\}\]

Parameters of the constructor are below.

Parameters:
  • size (int) – dimension of the Bernoulli random variable

  • theta (float) – success probablity: \(p(x=1)\)

  • random_seed (None|int) – random seed to set the underlying random number generator

__init__(size, theta=0.5, random_seed=None)[source]#

Initialize the policy. Must be implemented by subclasses.

update_policy_parameters(theta)[source]#

Update theta; that is \(p(x=1)\)

Parameters:

theta (float) – probability of sucess \(p(x==1)\)

sample_action()[source]#

Sample an appropriate action

sample_trajectory(init_state, length)[source]#

Use the policy probability distribution to sample a trajectory, starting from the passed initial state

Parameters:
  • init_state (State) – initial state of the trajectory to sample

  • length (int) – length of the trajectory; nuumber of state-action pairs to be sampled

Returns:

trajectory; list contining state-action pairs [(state, action), …(state, action)].

Note

  • The action associated with last state should do nothing to the corresponding state

  • Number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’

  • States are generated after the initial state which is used in the first pair in the trajectory.

Raises:

TypeError if init_state is not an instance of pyoed.ml.reinforcement_leanrning.State class

conditional_probability(s0, s1, log_probability=True)[source]#

Calculate the probability of s1 conditioned by \(s_0\); i.e. \(p(s_1|s_0)\).

Here, we assume iid entries of states

Parameters:
  • s0 (State) – Multivariate Bernoulli state

  • s1 (State) – Multivariate Bernoulli state

  • log_probability (bool) – either return probability or the log-probability

Returns:

p; value of the conditional probability

conditional_log_probability_gradient(s0, s1)[source]#

Calculate the gradient of log-probability of s1 conditioned by \(s_0\); i.e. \(\log p(s_1|s_0)\), w.r.t. parameters \(\theta\). Here, we assume iid entries of states

Parameters:
  • s0 (State) – Multivariate Bernoulli state

  • s1 (State) – Multivariate Bernoulli state

Returns:

log_prob_grad a vector containing derivatives of the log-probability of the multivariate Bernoulli w.r.t the parameters \(\theta\)