Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

Open in new window