0.1 Reinforcement Learning, A2C and V-trace