Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation