Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models