PolicyOptimizationwithAdvantageRegularization forLong-TermFairnessinDecisionSystems