Scrambled text: training Language Models to correct OCR errors using synthetic data