Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective