Provably Sample Efficient RLHF via Active Preference Optimization

Open in new window