Focused Transformer: Contrastive Training for Context Scaling