Visual Conceptual Blending with Large-scale Language and Vision Models