Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning

Open in new window