LearningGenerativeVisionTransformerwith Energy-BasedLatentSpaceforSaliencyPrediction