LocCa: Visual Pretraining with Location-aware Captioners