Improving OCR using internal document redundancy