Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning

Open in new window