Variational Transformer: A Framework Beyond the Trade-off between Accuracy and Diversity for Image Captioning