Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning Yibing Song