Improving face generation quality and prompt following with synthetic captions