Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions

Neural Information Processing Systems 

Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL). Existing theoretical studies on ICL have mainly focused on linear regression tasks, often with i.i.d.