What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?