Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention

Open in new window