Manboformer: Learning Gaussian Representations via Spatial-temporal Attention Mechanism