policy module¶
Module containing parent class for policies.
-
class
policy.
Policy
(action_space, feature_indices=None, value_function=None, action_equality=<function equal_twists>, *args, **kwargs)[source]¶ Parent class for policies.
Inherit this class to make a policy. Action selection is based on maintaining a
pi
array which holds action selection probabilities.Parameters: - action_space (numpy array of action) – Numpy array containing all actions available to any agent.
- value_function (fun, optional) – A function used by the Policy to update values of pi. This is usually a value function learned by a GVF.
- action_equality (fun, optional) – The function used to compare two action objects to determine whether they are equivalent. Returns True if the actions are equivalent and False otherwise.
- feature_indices (numpy array of bool, optional) – Indices of the
feature vector corresponding to indices used by the
value_function
.
-
action_space
¶ numpy array of action – Numpy array containing all actions available to any agent.
-
value_function
¶ fun – A function used by the Policy to update values of pi. This is usually a value function learned by a GVF.
-
action_equality
¶ fun – The function used to compare two action objects to determine whether they are equivalent. Returns True if the actions are equivalent and False otherwise.
-
feature_indices
¶ numpy array of bool – The indices of the feature vector corresponding to the indices used by the
value_function
.
-
pi
¶ numpy array of float – Numpy array containing probabilities corresponding to the actions at the corresponding index in
action_space
.
-
last_index
¶ int – The index of the last action chosen by the policy.
-
choose_action
(*args, **kwargs)[source]¶ Updates
last_index
and chooses an action according topi
.Parameters: - *args – Ignored.
- **kwargs – Ignored.
Returns: Action at the sampled index.
-
get_probability
(action, choice=True, *args, **kwargs)[source]¶ Get the probability of taking the provided action.
This function can usually be used without being overwritten. Throws an error if the provided action is not equal to an action in
action_space
according toaction_equality
.Parameters: - action (action) – Find the probability of this action.
- choice (bool) – If set to true, updates
last_index
. - *args – Ignored.
- **kwargs – Ignored.
Returns: Float from
pi
corresponding toaction
.
-
update
(phi, observation, *args, **kwargs)[source]¶ Updates the probilities of taking each action
This function should be replaced when creating a new policy. It takes a state
(phi, observation)
and modifies thepi
array accordingly.Parameters: - phi (numpy array of bool) – Binary feature vector.
- observation (dictionary) – User-defined dictionary containing
miscellaneous information about the state that should
not be included in the feature vector
phi
. - *args – Ignored.
- **kwargs – Ignored.