UIBert: Learning Generic Multimodal Representations for UI Understanding