Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

Open in new window