Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction