PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models

Open in new window