A Meta-Learning Perspective on Transformers for Causal Language Modeling