torchrl.runners package


torchrl.runners.base_runner module

class torchrl.runners.base_runner.BaseRunner[source]

Bases: object

This class defines how any environment must be executed to generate trajectories. The MAX_STEPS property must be respected by any derived runner so as to prevent infinite horizon trajectories during rollouts.

MAX_STEPS = 1000000

Cleanup any artifacts created by the runner. Typically, this will involve shutting down the environments and cleaning up the parallel trajectory threads.

compute_action(agent: torchrl.agents.base_agent.BaseAgent, obs_list: list)[source]

This helper method must be overriden by any derived class. It allows for flexible runners where any pre/post processing might be needed before/after the agent’s act() is called.

  • agent (BaseAgent) – Any derived agent.
  • obs_list (list) – A list of observations corresponding to each parallel environment.

A (potentially post-processed) action returned by any BaseAgent.

make_env(seed: int = None)[source]

This method must be overriden by a derived class and create the environment. For uniformity, any subsequent usage of the environment must be via the runner so that they are reproducible (for instance in terms of the arguments like seed and so on).

Parameters:seed (int) – Optional seed for the environment creation.
Returns:An object representing the environment. For instance it could be of type gym.Env.
process_transition(history, transition: tuple) → list[source]

This helper method must be overriden by any derived class. Effectively, this method should take in all previous history and append the current transition tuple.


The first call to this method will have history as None. This allows for flexibility in terms of the storage format of history. Make sure to handle this case and mutate as desired. See GymRunner for example.

  • history – A set of history items. The derived class is free to choose any type.
  • transition (tuple) – A transition tuple which represents current observation, action, reward, next observation and termination flag. Typically, this is a 5-tuple however the derived class is free to add more information here as long as handled appropriately.

The update history object.

rollout(agent, steps: int = None, render: bool = False, fps: int = 30)[source]

This is the main entrypoint for a runner object. Given an agent, it rolls out a trajectory of specified length. Optionally, it also allows a render flag.


Care must be taken when an environment reaches its terminal state. This could either be transparently resetting the environment or by other means. See GymRunner for example which resets the environment as and when needed.

  • agent (BaseAgent) – Any derived agent object.
  • steps (int) – An optional maximum number of steps to rollout the environments for. If None, the MAX_STEPS is used.
  • render (bool) – A flag to render the environment.
  • fps (int) – Amount of sleep before the code can start executing after each render.

A list of all objects needed for each parallel environment. Typically, this would involve the full trajectory for each environment which is defined by a list of transition tuples. See GymRunner for a concrete example.


  • render flag does not work across multiple threads while debugging. Tracked by #53.

torchrl.runners.gym_runner module

class torchrl.runners.gym_runner.GymRunner(env_id: str, seed: int = None, n_envs: int = 1, log_level=40)[source]

Bases: torchrl.runners.base_runner.BaseRunner

This is a runner for OpenAI Gym Environments. It follows the reset(), step() and close() API to generate trajectories and renders each parallel trajectory in its own thread.

  • env_id (str) – Environment ID registered with Gym.
  • seed (int) – Optional integer to seed stochastic environments.
  • n_envs (int) – Number of parallel environments (= trajectories).
  • log_level (int) – Log levels from gym.logger. (DEBUG = 10, INFO = 20, WARN = 30, ERROR = 40, DISABLED = 50)

Shutdown gym environments.

See close() for a general description.

compute_action(agent: torchrl.agents.base_agent.BaseAgent, obs_list: list)[source]

See compute_action() for general description.

make_env(seed: int = None) → gym.core.Env[source]

Create an return the environment. See make_env() for general description.


This helper routine checks any inactive environments and resets them to allow future rollouts.

process_transition(history, transition: tuple) → list[source]

Appends tuples of observation, action, reward, next observation and termination flag to the history.

See process_transition() for general description.

rollout(agent, steps: int = None, render: bool = False, fps: int = 30) → list[source]

Rollout trajectories from the Gym environment. See rollout() for general description.

Module contents