Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs