torchrl.contrib.controllers package

Submodules

torchrl.contrib.controllers.a2c_controller module

class torchrl.contrib.controllers.a2c_controller.A2CController(obs_size, action_size, gamma=0.99, lmbda=1.0, lr=0.001, alpha=0.5, beta=1.0, device=None)[source]

Bases: torchrl.controllers.controller.Controller

act(obs)[source]
compute_return(obs, action, reward, next_obs, done)[source]
learn(obs, action, reward, next_obs, done, returns)[source]

Placeholder method for the learning algorithm

torchrl.contrib.controllers.ddpg_controller module

class torchrl.contrib.controllers.ddpg_controller.DDPGController(obs_size, action_size, action_low, action_high, actor_lr=0.0001, critic_lr=0.001, gamma=0.99, tau=0.01, n_reset_interval=100, device=None)[source]

Bases: torchrl.controllers.controller.Controller

act(obs)[source]
learn(obs, action, reward, next_obs, done)[source]

Placeholder method for the learning algorithm

class torchrl.contrib.controllers.ddpg_controller.OUNoise(action_dim, mu=0.0, theta=0.15, max_sigma=0.3, min_sigma=0.3, decay_period=100000)[source]

Bases: object

evolve_state()[source]
get_action(action)[source]
reset()[source]
torchrl.contrib.controllers.ddpg_controller.polyak_average_(source, target, tau=0.001)[source]

In-place Polyak Average from the source to the target :param tau: Polyak Averaging Parameter :param source: Source Module :param target: Target Module :return:

torchrl.contrib.controllers.dqn_controller module

class torchrl.contrib.controllers.dqn_controller.DQNController(obs_size, action_size, double_dqn=False, gamma=0.99, lr=0.001, eps_max=1.0, eps_min=0.01, n_eps_anneal=1000, n_update_interval=5, device=None)[source]

Bases: torchrl.controllers.controller.Controller

act(obs)[source]
learn(obs, action, reward, next_obs, done)[source]

Placeholder method for the learning algorithm

torchrl.contrib.controllers.dqn_controller.epsilon_greedy(action_size: int, choices: numpy.core.multiarray.array, eps: float = 0.1)[source]

Batched epsilon-greedy :param action_size: Total number of actions :param choices: A list of choices :param eps: Value of epsilon :return:

torchrl.contrib.controllers.ppo_controller module

class torchrl.contrib.controllers.ppo_controller.PPOController(obs_size, action_size, gamma=0.99, lmbda=1.0, lr=0.001, alpha=0.5, beta=1.0, clip_ratio=0.2, max_grad_norm=1.0, device=None)[source]

Bases: torchrl.controllers.controller.Controller

act(obs)[source]
compute_return(obs, action, reward, next_obs, done)[source]
learn(obs, action, reward, next_obs, done, returns, old_log_probs, advantages)[source]

Placeholder method for the learning algorithm

Module contents