Learning optimal environments using projected stochastic gradient ascent