Unsupervised Document and Template Clustering using Multimodal Embeddings