Focused Transformer: Contrastive Training for Context Scaling

Neural Information Processing Systems 

Large language models have an exceptional capability to incorporate new information in a contextual manner.