gvf module

Authors:
Banafsheh Rafiee, Niko Yasui
class gvf.GVF(cumulant, gamma, target_policy, num_features, alpha0, alpha, name, learner, feature_indices, use_MSRE=False, **kwargs)[source]

Implements General Value Functions.

General Value Functions pose a question defined by the cumulant, gamma, and the target policy, that is learned by a learning algorithm, here called the learner.

Parameters:
  • cumulant (fun) – Function of observation that gives a float value.
  • gamma (fun) – Function of observation that gives a float value. Together with cumulant, makes the return that the agent tries to predict.
  • target_policy (Policy) – Policy under which the agent makes its predictions. Can be the same as the behavior policy.
  • num_features (int) – Number of features that are used.
  • alpha0 (float) – Value to calculate beta0 for RUPEE.
  • alpha (float) – Value to calculate alpha for RUPEE.
  • name (str) – Name of the GVF for recording data.
  • learner – Class instance with a predict and update function, and theta, tderr_elig, and delta attributes. For example, GTD.
  • feature_indices (numpy array of bool) – Indices of the features to use.
  • use_MSRE (bool) – Whether or not to calculate MSRE.
predict(phi, action=None, **kwargs)[source]
update(last_observation, phi, last_action, observation, phi_prime, mu, action)[source]