Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality