Bi-Level Offline Policy Optimization with Limited Exploration