How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics