Vision Language Models Are Not (Yet) Spelling Correctors