Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards