Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles

Open in new window