Fine-Grained Semantically Aligned Vision-Language Pre-Training