Adaptive Proximal Policy Optimization with Upper Confidence Bound