LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding