Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective