Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits