Gym Wrappers

Additional Gym Wrappers to enhance Gym environments.


class sb3_contrib.common.wrappers.TimeFeatureWrapper(env, max_steps=1000, test_mode=False)[source]

Add remaining, normalized time to observation space for fixed length episodes. See and


Only gym.spaces.Box and gym.spaces.Dict (gym.GoalEnv) 1D observation spaces are supported for now.

  • env (Env) – Gym env to wrap.

  • max_steps (int) – Max number of steps of an episode if it is not wrapped in a TimeLimit object.

  • test_mode (bool) – In test mode, the time feature is constant, equal to zero. This allow to check that the agent did not overfit this feature, learning a deterministic pre-defined sequence of actions.


Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Return type:

Union[Tuple, Dict[str, Any], ndarray, int]


observation (object): the initial observation.


Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Return type:

Tuple[Union[Tuple, Dict[str, Any], ndarray, int], float, bool, Dict]


action (object): an action provided by the agent


observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)