Enhancing Policy Gradient with the Polyak Step-Size Adaption