Supplementary Material Policy Learning Using Weak Supervision