Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

Open in new window