VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering

Open in new window