Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution