Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes