Propensity score models are better when post-calibrated