wis_to_gtd module¶
-
class
wis_to_gtd.
WISTOGTD
(num_features, u, eta, beta, lmbda, **kwargs)[source]¶ Implements WIS-TO-GTD(lambda) with linear function approximation.
See https://armahmood.github.io/files/MS-WIS-O(n)-UAI-2015.pdf for more details.
Parameters: - num_features (int) – Length of weight vectors.
- u (float) – Initial value for the usage vector. Can be interpreted as inverse initial step size.
- eta (float) – Recency-weighting factor. Can be interpreted as desired final step size.
- beta (float) – Secondary learning rate.
- lmbda (float) – Trace decay rate.
-
theta
¶ Primary weight vector.
-
w
¶ Secondary weight vector.
-
e
¶ Eligibility trace vector.
-
u
¶ Usage vector.
-
v
¶ Usage helper vector.
-
beta
¶ Secondary learning rate.
-
lmbda
¶ Trace decay rate.
-
old_gamma
¶ Discounting parameter from the previous timestep.
-
old_rho
¶ Importance sampling weight from previous timestep.
-
delta
¶ TD-error of previous timestep.
-
tderr_elig
¶ delta * e for RUPEE calculations.