DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret

Open in new window