COSA: Concatenated Sample Pretrained Vision-Language Foundation Model