OptimalAlgorithmsforStochasticContextual PreferenceBandits

Neural Information Processing Systems 

Yet, same as the classical setup, the goal is still to compete against the best context arm at each round.