RL Algorithms

This table displays the rl algorithms that are implemented in the Stable Baselines3 contrib project, along with some useful characteristics: support for discrete/continuous actions, multiprocessing.

Name

Box

Discrete

MultiDiscrete

MultiBinary

Multi Processing

ARS

✔️

❌️

✔️

MaskablePPO

✔️

✔️

✔️

✔️

QR-DQN

️❌

️✔️

✔️

RecurrentPPO

✔️

✔️

✔️

✔️

✔️

TQC

✔️

✔️

TRPO

✔️

✔️

✔️

✔️

✔️

Note

Tuple observation spaces are not supported by any environment, however, single-level Dict spaces are

Actions gym.spaces:

  • Box: A N-dimensional box that contains every point in the action space.

  • Discrete: A list of possible actions, where each timestep only one of the actions can be used.

  • MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.

  • MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.