Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers