Large-Scale Bidirectional Training for Zero-Shot Image Captioning