Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance