Transformer based Multitask Learning for Image Captioning and Object Detection

Open in new window