wis_to_gtd module

class wis_to_gtd.WISTOGTD(num_features, u, eta, beta, lmbda, **kwargs)[source]

Implements WIS-TO-GTD(lambda) with linear function approximation.

See https://armahmood.github.io/files/MS-WIS-O(n)-UAI-2015.pdf for more details.

Parameters:
  • num_features (int) – Length of weight vectors.
  • u (float) – Initial value for the usage vector. Can be interpreted as inverse initial step size.
  • eta (float) – Recency-weighting factor. Can be interpreted as desired final step size.
  • beta (float) – Secondary learning rate.
  • lmbda (float) – Trace decay rate.
theta

Primary weight vector.

w

Secondary weight vector.

e

Eligibility trace vector.

u

Usage vector.

v

Usage helper vector.

beta

Secondary learning rate.

lmbda

Trace decay rate.

old_gamma

Discounting parameter from the previous timestep.

old_rho

Importance sampling weight from previous timestep.

delta

TD-error of previous timestep.

tderr_elig

delta * e for RUPEE calculations.

predict(phi)[source]
update(phi, phi_prime, cumulant, gamma, rho, **kwargs)[source]