MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion