Appendix A Visual Reinforcement Learning Baselines DrQ: This model-free, off-policy reinforcement learning algorithm, is based on Soft Actor-Critic (SAC) [