torchrl.envs package

Submodules

torchrl.envs.env_utils module

torchrl.envs.env_utils.get_gym_spaces(make_env_fn: Callable[[...], gym.core.Env]) → Tuple[gym.spaces.space.Space, gym.spaces.space.Space][source]

A utility function to get observation and actions spaces of a Gym environment

torchrl.envs.env_utils.make_gym_env(spec_id: str, seed: Optional[int] = None) → gym.core.Env[source]

torchrl.envs.parallel_envs module

class torchrl.envs.parallel_envs.MultiProcWrapper(obj_fns, daemon=True, autostart=True)[source]

Bases: object

A generic wrapper which takes a list of functions to be run inside a process. Each function must return an object, see target_fn for how it is used. Communication between each new process and the parent process happens via Pipes.

exec_remote(fn_string, proc_list=None, args_list=None, kwargs_list=None)[source]
start()[source]
stop()[source]
class torchrl.envs.parallel_envs.ParallelEnvs(make_env_fn, n_envs: int = 1, base_seed: int = 0, daemon: bool = True, autostart: bool = True)[source]

Bases: torchrl.envs.parallel_envs.MultiProcWrapper

A utility class which wraps around multiple environments and runs them in subprocesses

close()[source]
render(env_ids: list)[source]
reset(env_ids: list)[source]
step(env_ids: list, actions: list)[source]
torchrl.envs.parallel_envs.target_fn(conn, obj_fn)[source]

torchrl.envs.test_env_utils module

torchrl.envs.test_env_utils.test_make_gym_env(spec_id: str)[source]

torchrl.envs.test_wrappers module

torchrl.envs.test_wrappers.test_transition_monitor(spec_id: str)[source]

torchrl.envs.wrappers module

class torchrl.envs.wrappers.TransitionMonitor(env: gym.core.Env)[source]

Bases: gym.core.Wrapper

Monitor any gym environment

flush() → list[source]

Empty transition buffer on demand.

property info
property is_done
property obs
reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

property transitions

Module contents