to_gtd module

class to_gtd.TOGTD(num_features, alpha, beta, lmbda, decay=False, **kwargs)[source]

Implements True Online GTD(lambda) with linear function approximation.

Parameters:
  • num_features (int) – Length of weight vectors.
  • alpha (float) – Primary learning rate.
  • beta (float) – Secondary learning rate.
  • lmbda (float) – Trace decay rate.
  • decay (bool, optional) – Whether to decay alpha and beta.
theta

Primary weight vector.

w

Secondary weight vector.

e

Eligibility trace vector.

e_grad

Gradient correction trace vector.

e_w

Secondary eligibility trace vector.

alpha

Primary learning rate.

beta

Secondary learning rate.

lmbda

Trace decay rate.

old_gamma

Discounting parameter from the previous timestep.

delta

TD-error of previous timestep.

tderr_elig

delta * e for RUPEE calculations.

predict(phi)[source]
update(phi, phi_prime, cumulant, gamma, rho, **kwargs)[source]