Evaluating Text-to-Visual Generation with Image-to-Text Generation