Making Non-StochasticControl(Almost)asEasyas Stochastic

Neural Information Processing Systems 

We attain the optimal eO( T) regret when the dynamics are unknown to the learner, and poly(logT) regret when known, provided that the cost functions are strongly convex (as in LQR).