Are Checklists Really Useful for Automatic Evaluation of Generative Tasks?