Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits

Open in new window