Changelog
Release 2.7.1 (2025-12-05)
Warning
Stable-Baselines3 (SB3) v2.7.1 will be the last one supporting Python 3.9 (end of life in October 2025). We highly recommended you to upgrade to Python >= 3.10.
Breaking Changes:
New Features:
Bug Fixes:
Fix tensorboard log name for
MaskablePPO
Deprecations:
Others:
Documentation:
Release 2.7.0 (2025-07-25)
Breaking Changes:
Upgraded to Stable-Baselines3 >= 2.7.0
New Features:
Added support for n-step returns for off-policy algorithms via the n_steps parameter
Bug Fixes:
Use the
FloatScheduleandLinearScheduleclasses instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems
Deprecations:
Others:
Documentation:
Release 2.6.0 (2025-03-24)
Breaking Changes:
Upgraded to Stable-Baselines3 >= 2.6.0
Renamed
_dump_logs()todump_logs()
New Features:
Added support for Gymnasium v1.1.0
Bug Fixes:
Fixed issues with
SubprocVecEnvandMaskablePPOby usingvec_env.has_attr()(pickling issues, mask function not present)
Release 2.5.0 (2025-01-27)
Breaking Changes:
Upgraded to PyTorch 2.3.0
Dropped Python 3.8 support
Upgraded to Stable-Baselines3 >= 2.5.0
New Features:
Added Python 3.12 support
Added Numpy v2.0 support
Release 2.4.0 (2024-11-18)
New algorithm: added CrossQ, Gymnasium v1.0 support
Breaking Changes:
Upgraded to Stable-Baselines3 >= 2.4.0
New Features:
Added
CrossQalgorithm, from “Batch Normalization in Deep Reinforcement Learning” paper (@danielpalen)Added
BatchRenormPyTorch layer used inCrossQ(@danielpalen)Added support for Gymnasium v1.0
Bug Fixes:
Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
Updated QR-DQN paper link in docs (@corentinlger)
Fixed a warning with PyTorch 2.4 when loading a RecurrentPPO model (You are using torch.load with weights_only=False)
Fixed loading QRDQN changes target_update_interval (@jak3122)
Others:
Updated PyTorch version on CI to 2.3.1
Remove unnecessary SDE noise resampling in PPO/TRPO update
Switched to uv to download packages on GitHub CI
Release 2.3.0 (2024-03-31)
New defaults hyperparameters for QR-DQN
Breaking Changes:
Upgraded to Stable-Baselines3 >= 2.3.0
The default
learning_startsparameter ofQRDQNhave been changed to be consistent with the other offpolicy algorithms
# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = QRDQN("MlpPolicy", env, learning_starts=50_000)
# SB3 >= 2.3.0:
model = QRDQN("MlpPolicy", env, learning_starts=100)
New Features:
Added
rollout_buffer_classandrollout_buffer_kwargsarguments to MaskablePPOLog success rate
rollout/success_ratewhen available for on policy algorithms
Others:
Fixed
train_freqtype annotation for tqc and qrdqn (@Armandpl)Fixed
sb3_contrib/common/maskable/*.pytype annotationsFixed
sb3_contrib/ppo_mask/ppo_mask.pytype annotationsFixed
sb3_contrib/common/vec_env/async_eval.pytype annotations
Documentation:
Add some additional notes about
MaskablePPO(evaluation and multi-process) (@icheered)
Release 2.2.1 (2023-11-17)
Breaking Changes:
Upgraded to Stable-Baselines3 >= 2.2.1
Switched to
rufffor sorting imports (isort is no longer needed), black and ruff version now require a minimum versionDropped
x is Falsein favor ofnot x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)
New Features:
Added
set_optionsforAsyncEvalAdded
rollout_buffer_classandrollout_buffer_kwargsarguments to TRPO
Others:
Fixed
ActorCriticPolicy.extract_features()signature by adding an optionalfeatures_extractorargumentUpdate dependencies (accept newer Shimmy/Sphinx version and remove
sphinx_autodoc_typehints)
Release 2.1.0 (2023-08-17)
Breaking Changes:
Removed Python 3.7 support
SB3 now requires PyTorch > 1.13
Upgraded to Stable-Baselines3 >= 2.1.0
New Features:
Added Python 3.11 support
Bug Fixes:
Fixed MaskablePPO ignoring
stats_window_sizeargument
Release 2.0.0 (2023-06-22)
Gymnasium support
Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.
Breaking Changes:
Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the
shimmypackage (@carlosluis, @arjun-kg, @tlpss)Upgraded to Stable-Baselines3 >= 2.0.0
Bug Fixes:
Fixed QRDQN update interval for multi envs
Others:
Fixed
sb3_contrib/tqc/*.pytype hintsFixed
sb3_contrib/trpo/*.pytype hintsFixed
sb3_contrib/common/envs/invalid_actions_env.pytype hints
Documentation:
Update documentation, switch from Gym to Gymnasium
Release 1.8.0 (2023-04-07)
Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here: https://gymnasium.farama.org/content/migration-guide/. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
Breaking Changes:
Removed shared layers in
mlp_extractor(@AlexPasqua)Upgraded to Stable-Baselines3 >= 1.8.0
New Features:
Added
stats_window_sizeargument to control smoothing in rollout logging (@jonasreiher)
Others:
Moved to pyproject.toml
Added github issue forms
Fixed Atari Roms download in CI
Fixed
sb3_contrib/qrdqn/*.pytype hintsSwitched from
flake8toruff
Documentation:
Added warning about potential crashes caused by
check_envin theMaskablePPOdocs (@AlexPasqua)
Release 1.7.0 (2023-01-10)
Warning
Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64]
will create separate networks with the same architecture, to be consistent with the off-policy algorithms.
Breaking Changes:
Removed deprecated
create_eval_env,eval_env,eval_log_path,n_eval_episodesandeval_freqparameters, please use anEvalCallbackinsteadRemoved deprecated
sde_net_archparameterUpgraded to Stable-Baselines3 >= 1.7.0
New Features:
Introduced mypy type checking
Added support for Python 3.10
Added
with_biasparameter toARSPolicyAdded option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
Features extractors now properly support unnormalized image-like observations (3D tensor) when passing
normalize_images=False
Bug Fixes:
Fixed a bug in
RecurrentPPOwhere the lstm states where incorrectly reshaped forn_lstm_layers > 1(thanks @kolbytn)Fixed
RuntimeError: rnn: hx is not contiguouswhile predicting terminal values forRecurrentPPOwhenn_lstm_layers > 1
Deprecations:
You should now explicitely pass a
features_extractorparameter when callingextract_features()Deprecated shared layers in
MlpExtractor(@AlexPasqua)
Others:
Fixed flake8 config
Fixed
sb3_contrib/common/utils.pytype hintFixed
sb3_contrib/common/recurrent/type_aliases.pytype hintFixed
sb3_contrib/ars/policies.pytype hintExposed modules in __init__.py with __all__ attribute (@ZikangXiong)
Removed ignores on Flake8 F401 (@ZikangXiong)
Upgraded GitHub CI/setup-python to v4 and checkout to v3
Set tensors construction directly on the device
Standardized the use of
from gym import spaces
Release 1.6.2 (2022-10-10)
Progress bar and upgrade to latest SB3 version
Breaking Changes:
Upgraded to Stable-Baselines3 >= 1.6.2
New Features:
Added
progress_barargument in thelearn()method, displayed using TQDM and rich packages
Deprecations:
Deprecate parameters
eval_env,eval_freqandcreate_eval_env
Others:
Fixed the return type of
.load()methods so that they now useTypeVar
Release 1.6.1 (2022-09-29)
Bug fix release
Breaking Changes:
Fixed the issue that
predictdoes not always return action asnp.ndarray(@qgallouedec)Upgraded to Stable-Baselines3 >= 1.6.1
New Features:
Bug Fixes:
Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with
RecurrentPPO(@mlodel)Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
Fixed calling child callbacks in MaskableEvalCallback (@CppMaster)
Fixed missing verbose parameter passing in the
MaskableEvalCallbackconstructor (@burakdmb)Fixed the issue that when updating the target network in QRDQN, TQC, the
running_meanandrunning_varproperties of batch norm layers are not updated (@honglu2875)
Deprecations:
Others:
Changed the default buffer device from
"cpu"to"auto"
Release 1.6.0 (2022-07-11)
Add RecurrentPPO (aka PPO LSTM)
Breaking Changes:
Upgraded to Stable-Baselines3 >= 1.6.0
Changed the way policy “aliases” are handled (“MlpPolicy”, “CnnPolicy”, …), removing the former
register_policyhelper,policy_baseparameter and usingpolicy_aliasesstatic attributes instead (@Gregwar)Renamed
rollout/exploration ratekey torollout/exploration_ratefor QRDQN (to be consistent with SB3 DQN)Upgraded to python 3.7+ syntax using
pyupgradeSB3 now requires PyTorch >= 1.11
Changed the default network architecture when using
CnnPolicyorMultiInputPolicywith TQC,share_features_extractoris now set to False by default and thenet_arch=[256, 256](instead ofnet_arch=[]that was before)
New Features:
Added
RecurrentPPO(aka PPO LSTM)
Bug Fixes:
Fixed a bug in
RecurrentPPOwhen calculating the masked loss functions (@rnederstigt)Fixed a bug in
TRPOwhere kl divergence was not implemented forMultiDiscretespace
Deprecations:
Release 1.5.0 (2022-03-25)
Breaking Changes:
Switched minimum Gym version to 0.21.0.
Upgraded to Stable-Baselines3 >= 1.5.0
New Features:
Allow PPO to turn of advantage normalization (see PR #61) (@vwxyzjn)
Bug Fixes:
Removed explict calls to
forward()method as per pytorch guidelines
Deprecations:
Others:
Documentation:
Release 1.4.0 (2022-01-19)
Add Trust Region Policy Optimization (TRPO) and Augmented Random Search (ARS) algorithms
Breaking Changes:
Dropped python 3.6 support
Upgraded to Stable-Baselines3 >= 1.4.0
MaskablePPOwas updated to match latest SB3PPOversion (timeout handling and new method for the policy object)
New Features:
Added
TRPO(@cyprienc)Added experimental support to train off-policy algorithms with multiple envs (note:
HerReplayBuffercurrently not supported)Added Augmented Random Search (ARS) (@sgillen)
Bug Fixes:
Deprecations:
Others:
Improve test coverage for
MaskablePPO
Documentation:
Release 1.3.0 (2021-10-23)
Add Invalid action masking for PPO
Warning
This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.
Breaking Changes:
Removed
sde_net_archUpgraded to Stable-Baselines3 >= 1.3.0
New Features:
Added
MaskablePPOalgorithm (@kronion)MaskablePPODictionary Observation support (@glmcdona)
Bug Fixes:
Deprecations:
Others:
Documentation:
Release 1.2.0 (2021-09-08)
Train/Eval mode support
Breaking Changes:
Upgraded to Stable-Baselines3 >= 1.2.0
Bug Fixes:
QR-DQN and TQC updated so that their policies are switched between train and eval mode at the correct time (@ayeright)
Deprecations:
Others:
Fixed type annotation
Added python 3.9 to CI
Documentation:
Release 1.1.0 (2021-07-01)
Dictionary observation support and timeout handling
Breaking Changes:
Added support for Dictionary observation spaces (cf. SB3 doc)
Upgraded to Stable-Baselines3 >= 1.1.0
Added proper handling of timeouts for off-policy algorithms (cf. SB3 doc)
Updated usage of logger (cf. SB3 doc)
Bug Fixes:
Removed unused code in
TQC
Deprecations:
Others:
SB3 docs and tests dependencies are no longer required for installing SB3 contrib
Documentation:
updated QR-DQN docs checkmark typo (@minhlong94)
Release 1.0 (2021-03-17)
Breaking Changes:
Upgraded to Stable-Baselines3 >= 1.0
Bug Fixes:
Fixed a bug with
QR-DQNpredict method when usingdeterministic=Falsewith image space
Pre-Release 0.11.1 (2021-02-27)
Bug Fixes:
Upgraded to Stable-Baselines3 >= 0.11.1
Pre-Release 0.11.0 (2021-02-27)
Breaking Changes:
Upgraded to Stable-Baselines3 >= 0.11.0
New Features:
Added
TimeFeatureWrapperto the wrappersAdded
QR-DQNalgorithm (@ku2482)
Bug Fixes:
Fixed bug in
TQCwhen saving/loading the policy only with non-default number of quantilesFixed bug in
QR-DQNwhen calculating the target quantiles (@ku2482, @guyk1971)
Deprecations:
Others:
Updated
TQCto match new SB3 versionUpdated SB3 min version
Moved
quantile_huber_losstocommon/utils.py(@ku2482)
Documentation:
Pre-Release 0.10.0 (2020-10-28)
Truncated Quantiles Critic (TQC)
Breaking Changes:
New Features:
Added
TQCalgorithm (@araffin)
Bug Fixes:
Fixed features extractor issue (
TQCwithCnnPolicy)
Deprecations:
Others:
Documentation:
Added initial documentation
Added contribution guide and related PR templates
Maintainers
Stable-Baselines3 is currently maintained by Antonin Raffin (aka @araffin), Ashley Hill (aka @hill-a), Maximilian Ernestus (aka @ernestum), Adam Gleave (@AdamGleave) and Anssi Kanervisto (aka @Miffyli).
Contributors:
@ku2482 @guyk1971 @minhlong94 @ayeright @kronion @glmcdona @cyprienc @sgillen @Gregwar @rnederstigt @qgallouedec @mlodel @CppMaster @burakdmb @honglu2875 @ZikangXiong @AlexPasqua @jonasreiher @icheered @Armandpl @danielpalen @corentinlger