Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems