Meta-learning For Vision-and-language Cross-lingual Transfer