Learning from Visual Observation via Offline Pretrained State-to-Go Transformer