Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF