A random measure approach to reinforcement learning in continuous time