Reference-based Metrics Disprove Themselves in Question Generation