torchrl.agents package

Submodules

torchrl.agents.a2c_agent module

class torchrl.agents.a2c_agent.BaseA2CAgent(observation_space, action_space, lr=0.001, gamma=0.99, lmbda=1.0, alpha=0.5, beta=1.0)[source]

Bases: torchrl.agents.base_agent.BaseAgent

act(obs)[source]

This is the method that should be called at every step of the episode. IMPORTANT: This method should be compatible with batches

Parameters:state – Representation of the state
Returns:identity of the action to be taken, as desired by the environment
checkpoint

This method must return an arbitrary object which defines the complete state of the agent to restore at any point in time

compute_returns(obs, action, reward, next_obs, done)[source]
learn(obs, action, reward, next_obs, done, returns)[source]

This method represents the learning step

models

This routine must return the list of trainable networks which external routines might want to generally operate on :return:

torchrl.agents.base_agent module

class torchrl.agents.base_agent.BaseAgent(observation_space, action_space)[source]

Bases: object

This is base agent specification which can encapsulate everything how a Reinforcement Learning Algorithm would function.

act(*args, **kwargs)[source]

This is the method that should be called at every step of the episode. IMPORTANT: This method should be compatible with batches

Parameters:state – Representation of the state
Returns:identity of the action to be taken, as desired by the environment
checkpoint

This method must return an arbitrary object which defines the complete state of the agent to restore at any point in time

learn(*args, **kwargs) → dict[source]

This method represents the learning step

models

This routine must return the list of trainable networks which external routines might want to generally operate on :return:

obs_to_tensor(obs)[source]
reset()[source]

Optional function to reset learner’s internals :return:

to(device: torch.device)[source]

This routine is takes the agent’s models attribute and sends them to a device.

See https://pytorch.org/docs/stable/nn.html#torch.nn.Module.to.

Parameters:device (torch.device) –
Returns:Updated class reference.
train(flag: bool = True)[source]

This routine is takes the agent’s models attribute and applies the training flag.

See https://pytorch.org/docs/stable/nn.html#torch.nn.Module.train.

Parameters:flag (bool) – True or False

torchrl.agents.ddpg_agent module

class torchrl.agents.ddpg_agent.BaseDDPGAgent(observation_space, action_space, actor_lr=0.0001, critic_lr=0.001, gamma=0.99, tau=0.01)[source]

Bases: torchrl.agents.base_agent.BaseAgent

act(obs, **kwargs)[source]

This is the method that should be called at every step of the episode. IMPORTANT: This method should be compatible with batches

Parameters:state – Representation of the state
Returns:identity of the action to be taken, as desired by the environment
checkpoint

This method must return an arbitrary object which defines the complete state of the agent to restore at any point in time

clip_action(action: numpy.array)[source]
learn(obs, action, reward, next_obs, done, **kwargs)[source]

This method represents the learning step

models

This routine must return the list of trainable networks which external routines might want to generally operate on :return:

reset()[source]

Optional function to reset learner’s internals :return:

torchrl.agents.ddpg_agent.polyak_average_(source, target, tau=0.001)[source]

In-place Polyak Average from the source to the target :param tau: Polyak Averaging Parameter :param source: Source Module :param target: Target Module :return:

torchrl.agents.dqn_agent module

class torchrl.agents.dqn_agent.BaseDQNAgent(observation_space, action_space, double_dqn=False, gamma=0.99, lr=0.001, eps_max=1.0, eps_min=0.01, num_eps_steps=1000, target_update_interval=5)[source]

Bases: torchrl.agents.base_agent.BaseAgent

act(obs)[source]

This is the method that should be called at every step of the episode. IMPORTANT: This method should be compatible with batches

Parameters:state – Representation of the state
Returns:identity of the action to be taken, as desired by the environment
checkpoint

This method must return an arbitrary object which defines the complete state of the agent to restore at any point in time

compute_q_values(obs, action, reward, next_obs, done)[source]
learn(obs, action, reward, next_obs, done, td_error)[source]

This method represents the learning step

models

This routine must return the list of trainable networks which external routines might want to generally operate on :return:

torchrl.agents.gym_random_agent module

class torchrl.agents.gym_random_agent.GymRandomAgent(observation_space, action_space)[source]

Bases: torchrl.agents.base_agent.BaseAgent

Take random actions on a Gym environment.

This is only tested on Classic Control environments from OpenAI Gym. It is only meant to get started working with new environments.

act(obs)[source]

This is the method that should be called at every step of the episode. IMPORTANT: This method should be compatible with batches

Parameters:state – Representation of the state
Returns:identity of the action to be taken, as desired by the environment
checkpoint

This method must return an arbitrary object which defines the complete state of the agent to restore at any point in time

learn(*args, **kwargs)[source]

This method represents the learning step

models

This routine must return the list of trainable networks which external routines might want to generally operate on :return:

torchrl.agents.ppo_agent module

class torchrl.agents.ppo_agent.BasePPOAgent(observation_space, action_space, lr=0.001, gamma=0.99, lmbda=0.01, alpha=0.5, beta=1.0, clip_ratio=0.2, max_grad_norm=1.0)[source]

Bases: torchrl.agents.base_agent.BaseAgent

act(obs)[source]

This is the method that should be called at every step of the episode. IMPORTANT: This method should be compatible with batches

Parameters:state – Representation of the state
Returns:identity of the action to be taken, as desired by the environment
checkpoint

This method must return an arbitrary object which defines the complete state of the agent to restore at any point in time

compute_returns(obs, action, reward, next_obs, done)[source]
learn(obs, action, reward, next_obs, done, returns, old_log_probs, advantages)[source]

This method represents the learning step

models

This routine must return the list of trainable networks which external routines might want to generally operate on :return:

Module contents