What You See is What You Read? Improving Text-Image Alignment Evaluation