Gym Wrappers¶
Additional Gym Wrappers to enhance Gym environments.
TimeFeatureWrapper¶
- class sb3_contrib.common.wrappers.TimeFeatureWrapper(env, max_steps=1000, test_mode=False)[source]¶
Add remaining, normalized time to observation space for fixed length episodes. See https://arxiv.org/abs/1712.00378 and https://github.com/aravindr93/mjrl/issues/13.
Note
Only
gym.spaces.Box
andgym.spaces.Dict
(gym.GoalEnv
) 1D observation spaces are supported for now.- Parameters:
env (
Env
) – Gym env to wrap.max_steps (
int
) – Max number of steps of an episode if it is not wrapped in aTimeLimit
object.test_mode (
bool
) – In test mode, the time feature is constant, equal to zero. This allow to check that the agent did not overfit this feature, learning a deterministic pre-defined sequence of actions.
- reset()[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Return type:
Union
[Tuple
,Dict
[str
,Any
],ndarray
,int
]
- Returns:
observation (object): the initial observation.
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Return type:
Tuple
[Union
[Tuple
,Dict
[str
,Any
],ndarray
,int
],float
,bool
,Dict
]
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)