Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation