What makes a good metric? Evaluating automatic metrics for text-to-image consistency