Appendix: PolicyOptimizationwithAdvantage RegularizationforLong-TermFairnessinDecision Systems