Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Open in new window