RL Algorithms¶
This table displays the rl algorithms that are implemented in the Stable Baselines3 contrib project, along with some useful characteristics: support for discrete/continuous actions, multiprocessing.
Name 




Multi Processing 

ARS 
✔️ 
❌️ 
❌ 
❌ 
✔️ 
MaskablePPO 
❌ 
✔️ 
✔️ 
✔️ 
✔️ 
QRDQN 
️❌ 
️✔️ 
❌ 
❌ 
✔️ 
RecurrentPPO 
✔️ 
✔️ 
✔️ 
✔️ 
✔️ 
TQC 
✔️ 
❌ 
❌ 
❌ 
✔️ 
TRPO 
✔️ 
✔️ 
✔️ 
✔️ 
✔️ 
Note
Tuple
observation spaces are not supported by any environment,
however, singlelevel Dict
spaces are
Actions gym.spaces
:
Box
: A Ndimensional box that contains every point in the action space.Discrete
: A list of possible actions, where each timestep only one of the actions can be used.MultiDiscrete
: A list of possible actions, where each timestep only one action of each discrete set can be used.MultiBinary
: A list of possible actions, where each timestep any of the actions can be used in any combination.