What does self-attention learn from Masked Language Modelling?