Unicorn: Text-Only Data Synthesis for Vision Language Model Training