SupplementaryMaterialfor BAIL: Best-ActionImitationLearningfor BatchDeepReinforcementLearning