What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding