Unified Video-Language Pre-training with Synchronized Audio