torchrl.problems package

Submodules

torchrl.problems.a2c module

class torchrl.problems.a2c.A2CProblem(hparams: torchrl.registry.problems.HParams, problem_args: argparse.Namespace, log_dir: str, device: str = 'cuda', show_progress: bool = True, checkpoint_prefix='checkpoint')[source]

Bases: torchrl.problems.gym_problem.GymProblem

train(history_list: list)[source]

This method must be overridden by the derived Problem class and should contain the core idea behind the training step.

There are no restrictions to what comes into this argument as long as the derived class takes care of following. Typically this should involve a list of rollouts (possibly for each parallel trajectory) and all relevant values for each rollout - observation, action, reward, next observation, termination flag and potentially other information. This raw data must be processed as desired. See hist_to_tensor() for a sample routine.

Note

It is a good idea to always use train() appropriately here.

Parameters:history_list (list) – A list of histories. This will typically be returned by the rollout() method of the runner.
Returns:A Python dictionary containing labeled losses.
Return type:dict

torchrl.problems.base_hparams module

torchrl.problems.base_hparams.base()[source]
torchrl.problems.base_hparams.base_ddpg()[source]
torchrl.problems.base_hparams.base_dqn()[source]
torchrl.problems.base_hparams.base_pg()[source]
torchrl.problems.base_hparams.base_ppo()[source]

torchrl.problems.ddpg module

class torchrl.problems.ddpg.DDPGProblem(hparams, problem_args, *args, **kwargs)[source]

Bases: torchrl.problems.gym_problem.GymProblem

train(history_list: list)[source]

This method must be overridden by the derived Problem class and should contain the core idea behind the training step.

There are no restrictions to what comes into this argument as long as the derived class takes care of following. Typically this should involve a list of rollouts (possibly for each parallel trajectory) and all relevant values for each rollout - observation, action, reward, next observation, termination flag and potentially other information. This raw data must be processed as desired. See hist_to_tensor() for a sample routine.

Note

It is a good idea to always use train() appropriately here.

Parameters:history_list (list) – A list of histories. This will typically be returned by the rollout() method of the runner.
Returns:A Python dictionary containing labeled losses.
Return type:dict

torchrl.problems.dqn module

class torchrl.problems.dqn.DQNProblem(hparams, problem_args, *args, **kwargs)[source]

Bases: torchrl.problems.gym_problem.GymProblem

train(history_list: list)[source]

This method must be overridden by the derived Problem class and should contain the core idea behind the training step.

There are no restrictions to what comes into this argument as long as the derived class takes care of following. Typically this should involve a list of rollouts (possibly for each parallel trajectory) and all relevant values for each rollout - observation, action, reward, next observation, termination flag and potentially other information. This raw data must be processed as desired. See hist_to_tensor() for a sample routine.

Note

It is a good idea to always use train() appropriately here.

Parameters:history_list (list) – A list of histories. This will typically be returned by the rollout() method of the runner.
Returns:A Python dictionary containing labeled losses.
Return type:dict

torchrl.problems.gym_problem module

class torchrl.problems.gym_problem.GymProblem(hparams: torchrl.registry.problems.HParams, problem_args: argparse.Namespace, log_dir: str, device: str = 'cuda', show_progress: bool = True, checkpoint_prefix='checkpoint')[source]

Bases: torchrl.registry.problems.Problem

This problem implements a Problem class to handle Gym related environments.

Note

Any HParams supplied to this problem must have the env_id property assigned to a valid Gym Environment identifier.

eval(epoch)[source]

This preset routine simply takes runs the agent and runs some evaluations.

Parameters:epoch (int) – Epoch number.
Returns:Average reward and standard deviation.
Return type:tuple
static hist_to_tensor(history_list, device: torch.device = 'cuda')[source]

A utility method to convert list of histories to PyTorch Tensors. Additionally, also sends the tensors to target device.

Parameters:
  • history_list (list) – List of histories for each parallel trajectory.
  • device (torch.device) – PyTorch device object.
Returns:

A list of tuples where each tuple represents the history item.

Return type:

list

make_runner(n_envs=1, seed=None) → torchrl.runners.base_runner.BaseRunner[source]

Create a set of parallel environments.

Parameters:
  • n_envs (int) – Number of parallel environments.
  • seed (int) – Optional base integer to seed environments with.
Returns:

An instantiated runner object.

Return type:

GymRunner

static merge_histories(*history_list)[source]

A utility function which merges histories from all the parallel environments.

Parameters:*history_list (list) –
Returns:A single tuple which effectively transposes the history of transition tuples.
Return type:tuple

torchrl.problems.ppo module

class torchrl.problems.ppo.PPOProblem(hparams: torchrl.registry.problems.HParams, problem_args: argparse.Namespace, log_dir: str, device: str = 'cuda', show_progress: bool = True, checkpoint_prefix='checkpoint')[source]

Bases: torchrl.problems.gym_problem.GymProblem

train(history_list: list)[source]

This method must be overridden by the derived Problem class and should contain the core idea behind the training step.

There are no restrictions to what comes into this argument as long as the derived class takes care of following. Typically this should involve a list of rollouts (possibly for each parallel trajectory) and all relevant values for each rollout - observation, action, reward, next observation, termination flag and potentially other information. This raw data must be processed as desired. See hist_to_tensor() for a sample routine.

Note

It is a good idea to always use train() appropriately here.

Parameters:history_list (list) – A list of histories. This will typically be returned by the rollout() method of the runner.
Returns:A Python dictionary containing labeled losses.
Return type:dict

torchrl.problems.prioritized_dqn module

class torchrl.problems.prioritized_dqn.PrioritizedDQNProblem(hparams, problem_args, *args, **kwargs)[source]

Bases: torchrl.problems.gym_problem.GymProblem

train(history_list: list)[source]

This method must be overridden by the derived Problem class and should contain the core idea behind the training step.

There are no restrictions to what comes into this argument as long as the derived class takes care of following. Typically this should involve a list of rollouts (possibly for each parallel trajectory) and all relevant values for each rollout - observation, action, reward, next observation, termination flag and potentially other information. This raw data must be processed as desired. See hist_to_tensor() for a sample routine.

Note

It is a good idea to always use train() appropriately here.

Parameters:history_list (list) – A list of histories. This will typically be returned by the rollout() method of the runner.
Returns:A Python dictionary containing labeled losses.
Return type:dict

Module contents