Solving General Noisy Inverse Problem via Posterior Sampling: A Policy Gradient Viewpoint