LocCa: Visual Pretraining with Location-aware Captioners

Open in new window