Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models