Multimodal Graph Transformer for Multimodal Question Answering