Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment