Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

Open in new window