LocCa: Visual Pretraining with Location-aware Captioners Bo Wan 1,3 Michael Tschannen 1 Y ongqin Xian