policy module¶

Module containing parent class for policies.

class policy.Policy(action_space, feature_indices=None, value_function=None, action_equality=<function equal_twists>, *args, **kwargs)[source]¶

Parent class for policies.

Inherit this class to make a policy. Action selection is based on maintaining a pi array which holds action selection probabilities.

Parameters:

action_space (numpy array of action) – Numpy array containing all actions available to any agent.
value_function (fun, optional) – A function used by the Policy to update values of pi. This is usually a value function learned by a GVF.
action_equality (fun, optional) – The function used to compare two action objects to determine whether they are equivalent. Returns True if the actions are equivalent and False otherwise.
feature_indices (numpy array of bool, optional) – Indices of the feature vector corresponding to indices used by the value_function.

action_space¶: numpy array of action – Numpy array containing all actions available to any agent.

value_function¶: fun – A function used by the Policy to update values of pi. This is usually a value function learned by a GVF.

action_equality¶: fun – The function used to compare two action objects to determine whether they are equivalent. Returns True if the actions are equivalent and False otherwise.

feature_indices¶: numpy array of bool – The indices of the feature vector corresponding to the indices used by the value_function.

pi¶: numpy array of float – Numpy array containing probabilities corresponding to the actions at the corresponding index in action_space.

last_index¶: int – The index of the last action chosen by the policy.

choose_action(*args, **kwargs)[source]¶

Updates last_index and chooses an action according to pi.

Parameters:	args – Ignored. *kwargs – Ignored.
Returns:	Action at the sampled index.

get_probability(action, choice=True, *args, **kwargs)[source]¶

Get the probability of taking the provided action.

This function can usually be used without being overwritten. Throws an error if the provided action is not equal to an action in action_space according to action_equality.

Parameters:	action (action) – Find the probability of this action. choice (bool) – If set to true, updates `last_index`. args – Ignored. *kwargs – Ignored.
Returns:	Float from `pi` corresponding to `action`.

update(phi, observation, *args, **kwargs)[source]¶

Updates the probilities of taking each action

This function should be replaced when creating a new policy. It takes a state (phi, observation) and modifies the pi array accordingly.

Parameters:	phi (numpy array of bool) – Binary feature vector. observation (dictionary) – User-defined dictionary containing miscellaneous information about the state that should not be included in the feature vector `phi`. args – Ignored. *kwargs – Ignored.