Pessimistic Risk-Aware Policy Learning in Contextual Bandits

Open in new window