On Learning to Rank Long Sequences with Contextual Bandits