A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies