Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks