Policy Gradient With Value Function Approximation For Collective Multiagent Planning