torchrl.envs package


torchrl.envs.env_utils module

torchrl.envs.env_utils.get_gym_spaces(make_env_fn: Callable[[...], gym.core.Env]) → Tuple[,][source]

A utility function to get observation and actions spaces of a Gym environment

torchrl.envs.env_utils.make_gym_env(spec_id: str, seed: Optional[int] = None) → gym.core.Env[source]

torchrl.envs.parallel_envs module

class torchrl.envs.parallel_envs.MultiProcWrapper(obj_fns, daemon=True, autostart=True)[source]

Bases: object

A generic wrapper which takes a list of functions to be run inside a process. Each function must return an object, see target_fn for how it is used. Communication between each new process and the parent process happens via Pipes.

exec_remote(fn_string, proc_list=None, args_list=None, kwargs_list=None)[source]
class torchrl.envs.parallel_envs.ParallelEnvs(make_env_fn, n_envs: int = 1, base_seed: int = 0, daemon: bool = True, autostart: bool = True)[source]

Bases: torchrl.envs.parallel_envs.MultiProcWrapper

A utility class which wraps around multiple environments and runs them in subprocesses

render(env_ids: list)[source]
reset(env_ids: list)[source]
step(env_ids: list, actions: list)[source]
torchrl.envs.parallel_envs.target_fn(conn, obj_fn)[source]

torchrl.envs.test_env_utils module

torchrl.envs.test_env_utils.test_make_gym_env(spec_id: str)[source]

torchrl.envs.test_wrappers module

torchrl.envs.test_wrappers.test_transition_monitor(spec_id: str)[source]

torchrl.envs.wrappers module

class torchrl.envs.wrappers.TransitionMonitor(env: gym.core.Env)[source]

Bases: gym.core.Wrapper

Monitor any gym environment

flush() → list[source]

Empty transition buffer on demand.

property info
property is_done
property obs

Resets the state of the environment and returns an initial observation.


the initial observation.

Return type

observation (object)


Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).


action (object) – an action provided by the agent


agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

property transitions

Module contents