Token-Level Fitting Issues of Seq2seq Models