Vision Language Transformers: A Survey