Convolutional Embedding Makes Hierarchical Vision Transformer Stronger