Examples

TQC

Train a Truncated Quantile Critics (TQC) agent on the Pendulum environment.

from sb3_contrib import TQC

model = TQC("MlpPolicy", "Pendulum-v0", top_quantiles_to_drop_per_net=2, verbose=1)
model.learn(total_timesteps=10000, log_interval=4)
model.save("tqc_pendulum")

QR-DQN

Train a Quantile Regression DQN (QR-DQN) agent on the CartPole environment.

from sb3_contrib import QRDQN

policy_kwargs = dict(n_quantiles=50)
model = QRDQN("MlpPolicy", "CartPole-v1", policy_kwargs=policy_kwargs, verbose=1)
model.learn(total_timesteps=10000, log_interval=4)
model.save("qrdqn_cartpole")