Autoencoding beyond pixels using a learned similarity metric