gvf module¶
- Authors:
- Banafsheh Rafiee, Niko Yasui
-
class
gvf.
GVF
(cumulant, gamma, target_policy, num_features, alpha0, alpha, name, learner, feature_indices, use_MSRE=False, **kwargs)[source]¶ Implements General Value Functions.
General Value Functions pose a question defined by the cumulant, gamma, and the target policy, that is learned by a learning algorithm, here called the
learner
.Parameters: - cumulant (fun) – Function of observation that gives a float value.
- gamma (fun) – Function of observation that gives a float value. Together with cumulant, makes the return that the agent tries to predict.
- target_policy (Policy) – Policy under which the agent makes its predictions. Can be the same as the behavior policy.
- num_features (int) – Number of features that are used.
- alpha0 (float) – Value to calculate beta0 for RUPEE.
- alpha (float) – Value to calculate alpha for RUPEE.
- name (str) – Name of the GVF for recording data.
- learner – Class instance with a
predict
andupdate
function, andtheta
,tderr_elig
, anddelta
attributes. For example, GTD. - feature_indices (numpy array of bool) – Indices of the features to use.
- use_MSRE (bool) – Whether or not to calculate MSRE.