Learning Multimodal Word Representation via Dynamic Fusion Methods