Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards