Evaluating Model Bias Requires Characterizing its Mistakes