CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning