auto_docking_policies module¶
Contains the policies needed for auto-docking.
- Authors:
- Shibhansh Dohare, Niko Yasui.
-
class
auto_docking_policies.
AlternatingRotation
(action_space, *args, **kwargs)[source]¶ Bases:
policy.Policy
A policy for task-3 of auto-docking.
According to the policy, the robot rotates in one direction for some time and in the opposite direction for some other time, with the goal of aligning the robot with the docking station. It’s a non-markov policy. Should be used as behaviour policy. It is designed to improve exploration comapred to Greedy or eGreedy for task-3 (align the robot with the docking station).
Parameters: action_space (numpy array of action) – Numpy array containing all actions available to any agent.
-
class
auto_docking_policies.
EGreedy
(epsilon, action_space, value_function, feature_indices, *args, **kwargs)[source]¶ Bases:
policy.Policy
An epsilon-greedy policy.
Parameters: - epsilon (float, optional) – Proportion of time to take a random action. Default: 0 (greedy).
- action_space (numpy array of action) – Numpy array containing all actions available to any agent.
- value_function – A function used by the Policy to update values of pi. This is usually a value function learned by a GVF.
- feature_indices (numpy array of bool) – Indices of the feature
vector corresponding to indices used by the
value_function
.
-
class
auto_docking_policies.
ForwardIfClear
(action_space, *args, **kwargs)[source]¶ Bases:
policy.Policy
An implementation of ForwardIfClear for task-2 of auto-docking.
According to the policy, the robot moves forward with a high probablity and turns with a small probablity. It also turns if the robot was moving forward and it encountered a bump. It’s a markov policy. Should be used as behaviour policy. It is designed to improve exploration comapred to Greedy or eGreedy for task-2 (taking the robot to the center IR region).
-
class
auto_docking_policies.
Switch
(explorer, exploiter, num_timesteps_explore)[source]¶ Switches between two policies.
According to the policy, robot follows an exploring policy for some time steps and then follows the learned greedy policy for some time. It’s a non-markov policy. Should be used as behaviour policy. It is designed to check how well the robot has performed at regular intervals.
-
explorer
¶ policy – Policy to use for exploration.
-
exploiter
¶ policy – Policy to use for exploitation.
-
num_timesteps_explore
¶ int – Number of timesteps to run each policy before switching.
-
t
¶ int, not passed – Counter. Switch policies when counter reaches
num_timesteps_explore
.
-