Small batch deep reinforcement learning