On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss