auto_docking_policies module¶

Contains the policies needed for auto-docking.

Authors:: Shibhansh Dohare, Niko Yasui.

class auto_docking_policies.AlternatingRotation(action_space, *args, **kwargs)[source]¶

Bases: policy.Policy

A policy for task-3 of auto-docking.

According to the policy, the robot rotates in one direction for some time and in the opposite direction for some other time, with the goal of aligning the robot with the docking station. It’s a non-markov policy. Should be used as behaviour policy. It is designed to improve exploration comapred to Greedy or eGreedy for task-3 (align the robot with the docking station).

Parameters:	action_space (numpy array of action) – Numpy array containing all actions available to any agent.

update(*args, **kwargs)[source]¶: Deterministically sets pi based on the timestep.

class auto_docking_policies.EGreedy(epsilon, action_space, value_function, feature_indices, *args, **kwargs)[source]¶

Bases: policy.Policy

An epsilon-greedy policy.

Parameters:

epsilon (float, optional) – Proportion of time to take a random action. Default: 0 (greedy).
action_space (numpy array of action) – Numpy array containing all actions available to any agent.
value_function – A function used by the Policy to update values of pi. This is usually a value function learned by a GVF.
feature_indices (numpy array of bool) – Indices of the feature vector corresponding to indices used by the value_function.

update(phi, *args, **kwargs)[source]¶

class auto_docking_policies.ForwardIfClear(action_space, *args, **kwargs)[source]¶

Bases: policy.Policy

An implementation of ForwardIfClear for task-2 of auto-docking.

According to the policy, the robot moves forward with a high probablity and turns with a small probablity. It also turns if the robot was moving forward and it encountered a bump. It’s a markov policy. Should be used as behaviour policy. It is designed to improve exploration comapred to Greedy or eGreedy for task-2 (taking the robot to the center IR region).

update(phi, observation, *args, **kwargs)[source]¶: Updates pi depending on if there is a bump or not.

class auto_docking_policies.Switch(explorer, exploiter, num_timesteps_explore)[source]¶

Switches between two policies.

According to the policy, robot follows an exploring policy for some time steps and then follows the learned greedy policy for some time. It’s a non-markov policy. Should be used as behaviour policy. It is designed to check how well the robot has performed at regular intervals.

explorer¶: policy – Policy to use for exploration.

exploiter¶: policy – Policy to use for exploitation.

num_timesteps_explore¶: int – Number of timesteps to run each policy before switching.

t¶: int, not passed – Counter. Switch policies when counter reaches num_timesteps_explore.

choose_action(*args, **kwargs)[source]¶

get_probability(*args, **kwargs)[source]¶

update(*args, **kwargs)[source]¶