Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent