Beyond Performance: Quantifying and Mitigating Label Bias in LLMs