Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation