Machine Learning Tools#
This package provides machine learning components for solving OED problems, currently focused on reinforcement learning (RL). The RL approach treats OED as a sequential decision-making problem where a learned policy maps the current experimental design (state) to a modification (action).
Note
This page is also accessible from the
Optimization subpackage, where
ReinforceOptimizer and its binary variants
consume these components.
Key Classes at a Glance#
Abstract base class defining a Reinforcement Learning policy for OED. |
|
Abstract base class defining a Reinforcement Learning state. |
|
Abstract base class defining a Reinforcement Learning action. |
|
Abstract base class defining a Reinforcement Learning agent. |
|
A class defining a Reinfocement Learning policy, based on Bernoulli distribution |
|
A Reinforcement Learning state represented as a binary vector. |
|
A Reinforcement Learning action represented as a binary flip vector. |
Reinforcement Learning Concepts in PyOED#
The PyOED RL framework maps directly onto standard RL terminology:
RL concept |
PyOED class |
Description |
|---|---|---|
State |
Current experimental design vector \(\zeta\) |
|
Action |
Modification to the design (e.g., flip a sensor on/off) |
|
Policy |
Distribution over actions given the current state |
|
Agent |
Runs episodes, accumulates rewards, and updates the policy via REINFORCE |
|
Reward |
OED criterion value |
How good the current design is (e.g., negative A-optimality) |
Built-in policies:
BernoulliPolicy— independent Bernoulli distribution per sensor; suitable for unconstrained binary OED.
Built-in states / actions:
BinaryState— binary design vector withcopy()andupdate()helpers.FlipAction— flips a single entry of the binary design.
Reinforcement Learning#
Reinforcement Learning (Policy Gradient) tailored for binary OED optimization.
This module provides base classes for RL-based OED, including abstract
Policy, State, and Action classes that
define the RL interface for experimental design problems.
This work was developed initially under the DOERL package: ahmedattia/doerl
Note
This was part of Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments; see: https://arxiv.org/abs/2101.05958
- class Policy(*args, **kwargs)[source]#
Bases:
objectAbstract base class defining a Reinforcement Learning policy for OED.
A policy maps the current state of the environment (experiment) to a probability distribution over actions (design decisions). Subclasses must implement all methods.
class MyPolicy(Policy): def __init__(self, size): self._size = size self._params = np.ones(size) * 0.5 def update_policy_parameters(self, new_params): self._params = new_params def sample_action(self, state): return np.random.binomial(1, self._params)
- update_policy_parameters(*args, **kwargs)[source]#
Update the policy parameters (e.g., after a gradient step).
- Parameters:
args – implementation-specific positional arguments.
kwargs – implementation-specific keyword arguments.
- sample_action(*args, **kwargs)[source]#
Use the policy probability distribution to sample a single action.
- Returns:
the sampled action.
- sample_trajectory(*args, **kwargs)[source]#
Use the policy probability distribution to sample a trajectory, starting from the passed initial state.
- Parameters:
init_state – the initial state to start the trajectory from.
length – length of the trajectory; number of state-action pairs to be sampled.
- Returns:
trajectory as a list containing state-action pairs
[(state, action), ..., (state, action)]. The action associated with the last state should do nothing to the corresponding state. The number of entries in the trajectory is equal tolength + 1, i.e.,lengthstates are generated after the initial state which is used in the first pair in the trajectory.- Return type:
- conditional_probability(*args, **kwargs)[source]#
Calculate the probability of s1 conditioned by \(s_0\); i.e. \(p(s_1|s_0)\)
- conditional_log_probability_gradient(*args, **kwargs)[source]#
Calculate the gradient of log-probability of s1 conditioned by \(s_0\); i.e. \(p(s_1|s_0)\), w.r.t. parameters
- property hyperparameters#
- class State(*args, **kwargs)[source]#
Bases:
objectAbstract base class defining a Reinforcement Learning state.
A state represents the current configuration of the environment (e.g., the current binary experimental design vector). Subclasses must implement
__init__to initialize the internal state representation.- property size#
Size of the internal state
- property state#
Retrieve the internal state
- class Action[source]#
Bases:
objectAbstract base class defining a Reinforcement Learning action.
An action represents a modification to be applied to the current state (e.g., which bits of a binary design to flip). Subclasses must implement
__init__to initialize the internal action representation.- property size#
Size of the internal action
- property action#
Retrieve the internal action
- class Agent(*args, **kwargs)[source]#
Bases:
objectAbstract base class defining a Reinforcement Learning agent.
An agent encapsulates the policy, environment interaction, and training loop. It samples trajectories, computes rewards, estimates policy gradients, and updates the policy parameters.
- trajectory_return(*args, **kwargs)[source]#
Given a trajectory; sequence/list of state-action pairs, calculate the total reward.
Given the current state, and action, return the value of the reward function. Inspect policy if needed
- reward(*args, **kwargs)[source]#
Given the current state, and action, return the value of the reward function. Inspect policy if needed
- property policy#
Handle to the underlying policy
- class FlipAction(size)[source]#
Bases:
ActionA Reinforcement Learning action represented as a binary flip vector.
Each entry indicates whether the corresponding state entry is kept as-is (
False) or flipped (True).- Parameters:
size (int) – size of the binary vector/state
- property size#
Size of the internal action
- property action#
Retrieve the internal action
- class BinaryState(size)[source]#
Bases:
StateA Reinforcement Learning state represented as a binary vector.
- Parameters:
size (int) – size of the binary vector/state
- update(action, in_place=True)[source]#
Given an action object, update the entries of the internal state, update the state and either return self, or a copy with updated state
- Parameters:
action (FlipAction) – action to apply to the current state
- Returns:
state resulting by applying the passed action on the current state
- class BernoulliPolicy(size, theta=0.5, random_seed=None)[source]#
Bases:
Policy,RandomNumberGenerationMixinA class defining a Reinfocement Learning policy, based on Bernoulli distribution
\[p(x) = \theta^{x}\times (1-\theta)^{1-x}, \quad x \in \{0, 1\}\]Parameters of the constructor are below.
- Parameters:
- __init__(size, theta=0.5, random_seed=None)[source]#
Initialize the policy. Must be implemented by subclasses.
- update_policy_parameters(theta)[source]#
Update theta; that is \(p(x=1)\)
- Parameters:
theta (float) – probability of sucess \(p(x==1)\)
- sample_trajectory(init_state, length)[source]#
Use the policy probability distribution to sample a trajectory, starting from the passed initial state
- Parameters:
- Returns:
trajectory; list contining state-action pairs [(state, action), …(state, action)].
Note
The action associated with last state should do nothing to the corresponding state
Number of entries in the trajectory is equal to the passed length+1, i.e., ‘length’
States are generated after the initial state which is used in the first pair in the trajectory.
- Raises:
TypeError if init_state is not an instance of pyoed.ml.reinforcement_leanrning.State class
- conditional_probability(s0, s1, log_probability=True)[source]#
Calculate the probability of s1 conditioned by \(s_0\); i.e. \(p(s_1|s_0)\).
Here, we assume iid entries of states