Supplementary Material to Bi-Level Offline Policy Optimization with Limited Exploration