Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER