Augmenting Self-attention with Persistent Memory