Towards Formalizing Reinforcement Learning Theory