Towards on-sky adaptive optics control using reinforcement learning