Deep Reinforcement Learning in HOL4