How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey