Recovering from Out of sample States via Inverse Dynamics in Reinforcement Learning