gvf module¶

Authors:: Banafsheh Rafiee, Niko Yasui

class gvf.GVF(cumulant, gamma, target_policy, num_features, alpha0, alpha, name, learner, feature_indices, use_MSRE=False, **kwargs)[source]¶

Implements General Value Functions.

General Value Functions pose a question defined by the cumulant, gamma, and the target policy, that is learned by a learning algorithm, here called the learner.

Parameters:

cumulant (fun) – Function of observation that gives a float value.
gamma (fun) – Function of observation that gives a float value. Together with cumulant, makes the return that the agent tries to predict.
target_policy (Policy) – Policy under which the agent makes its predictions. Can be the same as the behavior policy.
num_features (int) – Number of features that are used.
alpha0 (float) – Value to calculate beta0 for RUPEE.
alpha (float) – Value to calculate alpha for RUPEE.
name (str) – Name of the GVF for recording data.
learner – Class instance with a predict and update function, and theta, tderr_elig, and delta attributes. For example, GTD.
feature_indices (numpy array of bool) – Indices of the features to use.
use_MSRE (bool) – Whether or not to calculate MSRE.

predict(phi, action=None, **kwargs)[source]¶

update(last_observation, phi, last_action, observation, phi_prime, mu, action)[source]¶